Tuesday, August 23, 2011

Huffman Compressor

Just wrote a little compression tool.

It uses the huffman algorithm to calculate a prefix code tree,
which is used to compress the input file.

The format of a compressed file looks as follows:

[signature: {'H', 'C', 'F', '\0'}] (HCF = Huffman Compressed File)
[content size] (a 64 bit unsigned integer containing the size of the original file)
[huffman tree] (here the tree is serialized... for more information have a look at huffman_tree.c)
[content] (finally we have the compressed content)

The quality of the compression depends heavily on the content of the input file.
A rather big text file can be compressed very well,
while large binary files may nearly keep their size.

This compression algorithm is not recommended for very small files
as such files might even become bigger.

The program source (GPLv3) can be found here: http://www-user.tu-chemnitz.de/~bytow/
The package also contains the Windows binaries of the program.