Download PDF version [opens new window]
Internet users who download or upload files from/to the Internet, or use email to send or receive attachments, will most likely have encountered files in a compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression.
Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded.Lossless compression algorithms use statistic modelling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences.
Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
As compression is a mathematically intense process, it may be a time-consuming process, especially when there is a large number of files involved. Some compression algorithms also offer varying levels of compression, with the higher levels achieving a smaller file size but taking up an even longer amount of compression time. It is a system intensive process that takes up valuable resources that can sometimes result in “Out of Memory” errors. With so many compression algorithm variants, a user downloading a compressed file may not have the necessary program to un-compress it.Some transmission protocols may include optional compression built-in (e.g. FTP has a MODE-Z compression option), so that taking time to compress data by another process before transmission may negate some of the advantages of using such an option in the protocol (because what is eventually submitted for transmission to/by the protocol is probably now not very further-compressible at all, and may waste time while the protocol tries and fails to achieve more compression). It is distinctly possible that ‘external’ compression beforehand is more efficient these days, and that any compression option in the protocol should probably be deprecated. However, it is not beyond the bounds of possibility that the built-in compression achieves faster overall results, but possibly with larger compressed files, or vice versa. Experimentation should be employed to ascertain which applies, versus which factor is most important to the user.
In 1949, the Shannon-Fano coding was devised by Claude Shannon and Robert Fano to assign code words based on block probabilities. This technique was only considered fairly efficient in variable-length encodings. In 1951, David Huffman found an optimally efficient method that was better than the Shannon-Fano coding by using a frequency-sorted binary tree. Huffman coding is often used as a backend to other compression methods today.
In 1977, ground-breaking LZ77 and LZ78 algorithms were invented by Abraham Lempel and Jacob Ziv, which gained popularity rapidly. Some commonly used algorithms used today like DEFLATE, LZMA and LZX are derived from LZ77. Due to patent issues with LZ78 in 1984, UNIX developers began to adopt open source algorithms like the DEFLATE-based gzip and the Burrows-Wheeler Transform-based BZip2 formats, which managed to achieve significantly higher compression than those based on LZ78.
There are several types of compression available. In the following section, we shall review the 5 types of compression offered by the backup and synchronization software, SyncBackFree, SyncBackSE and SyncBackPro.
Compression Type Available in SyncBackFree
Compression Types Available in SyncBackSE (Including compression types supported by SyncBackFree)
Compression Types Available in SyncBackPro (Including all the compression types supported by SyncBackFree and SyncBackSE)
In conclusion, data compression is very important in the computing world and it is commonly used by many applications, including the suite of SyncBack programs. In providing a brief overview on how compression works in general it is hoped this article allows users of data compression to weigh the advantages and disadvantages when working with it.