Post by scalion on Apr 27, 2020 22:02:23 GMT 1
I create it to use in my actual ultra-secret project. That provide compress/decompress using huffman codes.
The library source : CompressToolsLib.G32 (17.05 KB)
The compiled library : CompressToolsLib.lg32 (7.41 KB)
An example of using : Test CompressToolsLib.G32 (1.5 KB)
Have good day all.

- Update to add 2 new procedures : 01/05/2020
DictionnaryCompress(AdrTest, SizeTest, AdrC, CSize)
DictionnaryUncompress(AdrC, CSize, AdrU, USize)
DictionnaryUncompress(AdrC, CSize, AdrU, USize)
For the moment this dictionnary dont use the Huffman codes but that work.
- Update to correcting bug (0xc00005) : 28/04/2020
Number of codes was written with value 1 to 256 that cause the problem with big data who need 256 codes
Now it writen-1 and read+1 : test ok
Number of codes was written with value 1 to 256 that cause the problem with big data who need 256 codes
Now it writen-1 and read+1 : test ok
What is Huffman Code ?
Example : you have a string "abracadabra" that take 11 bytes, then 11*8 = 88 bits.
only letter "abrcd" are used. You remark that can be coded on 3 bits instead 8 bits, then 11*3 = 33 bits.
Go further : a appear 5 times, b and r 2 times, others 1 times.
You remark we can use only 1 bits for a and 3 for all others then whe have 5*1 + 6*3 = 23 bits !
It also mean that each code must not be a prefix of other longer code if we want to be able to read it.
a=0 b=110 r=111 d=100 c=101 then abracadabra give 0 110 111 0 101 0 100 0 110 111 0
That's what Huffman Code do !
In addition, the codes are optimized to the maximum so that the number of bits required per value is calculated so that the sum of the coded data is minimal
What is Dictionnay Compression ?
Example : again the string "abracadabra" who take 11 bytes, then 88 bits. "abra" appear 2 times, that mean we can use a code to say "repeat what found 9 bytes before for 4 bytes). We considere the data are a dictionnary and if a byte block appears more than 1 time, we code it with the offset (9 bytes before) and the block lenght (4 bytes).
In this case we indicate how many bits are necessary to store the offset and the lenght. We write this values with 4 bits (0 to 15) then we can indicate we need 1 to 16 bits to store each value of offset and lenght.
In case of "abra" with offset 9 and lenght 4 with use only 15 bits instead 32 bits of "abra":
Necessary Bits to store Offset | Value of offset | Necessary Bits to store Lenght | Value of Lenght |
4 (we write 3) | 9 | 3 (we write 2) | 4 |
0011 | 1001 | 0010 | 100 |
The 4 bits used to store offset and lenght implicate the max value is 65536. Therefore the dictionary must be found in the 65536 bytes juste before the data pointer to be compressed or uncompressed. In practice we call this a "sliding windows". The most popular implementation of dictionnary compression is the Lempel-Ziv-Oberhumer algorithm (LZO) and its derivatives LZ77 LZ78 etc...
Why this library ?
Huffman codes is a standard compression used conjointly with many compression zlib,zip,jpeg etc...
Huffman dont really compress data, just use necessary memory to store information, that need the list of codes in header data.
To perform a real compression you will need to arrange data with some algorythm before use Huffman codes.
For example you can us simple repeat detection, or dictionnary compression.
Limitation :
This library is simply written for max 256 values (bytes data) that's what i need. But in fact a real Huffman Codes can work on more than 256. I will maybe write an extension to manage it in future.
How to use this library ?
- To encode use proc HuffmanCodesCompress(Adr,Size,AdrCompressed,SizeCompressed)
Indicate the adress ans size of data to encode then AdrCompressed and SizeCompressed are updated with the memory location of encoded data
- To decode use Proc HuffmanCodesUncompress(Adr,Size,AdrUncompressed,SizeUncompressed)
Indicate the adress ans size of data to decode then AdrUncompressed and SizeUncompressed are updated with the memory location of decoded data
- To retrieve informations on encoded data use function Long Var=HuffmanCodesGet(Adr%, Info%, Optional Index%)
You pass the adress of encoded data and what you want to retriece with Info%.
You pass the adress of encoded data and what you want to retriece with Info%.
When optional Index is expected note it's between 0 and the number of index -1.
- Info% can take one of this value :
Constant Name | Value | Description |
CHC_VALUES | 1 | Return the number of entries in Huffman codes table |
CHC_VALUE | 2 | Return the value of entries Index% (0 to Values-1) |
CHC_CODE_LENGHT | 3 | Return the Huffman code lenght of entries Index% (0 to Values-1) |
CHC_CODE | 4 | Return the Huffman code (in long value) of entries Index% (0 to Values-1) |
CHC_HEADER_BIT_SIZE | 5 | Return the size in bits of Huffman codes informations |
CHC_UNCOMPRESSED_SIZE | 6 | Return the size of original data |
CHC_DATA_BIT_SIZE | 7 | Return the size in bits of encoded data |