Compresssion

About compression

Compression is the process of reducing the size of files. It can be useful to reduce the size of files to save on storage space or make them quicker to download or small enough to fit within email attachment limits. There are two types of compression to be aware of: lossy and lossless.

Lossy and lossless compression

Lossy compression is where we reduce the size of a file by removing details we hope will not be noticed. This might mean reducing the number of different colours in an image. When lossy compression is used and the data is reconstructed, it will not be the same as the original. Examples of lossy compression include .mp3 for sound and .jpg for images.

Lossless compression is where we reduce the size of a file by summarising runs of repeating data. This might mean recording frequency value pairs of the colour of a pixel and the number of pixels in a row that are that colour. When lossless compression is used and the data is reconstructed, it will be identical to the original. Examples of lossless compression include .flac for sound, .gif for images and .zip for all files.

Certain files like text files, databases and program code cannot be compressed using lossy compression as losing any characters would render them unusable. Although lossless compression keeps the original file in tact, the trade-off is that it does not compress the filesize as much as lossy compression.

Run length encoding

Run length encoding is summarising runs of the same data into frequency-value pairs. It is useful for data that has runs of similar data such as low resolution images. it is not effective where the data does not contain runs. For instance 0000011100000011 would become 5 0 3 1 6 0 2 1.

Huffman encoding

Huffman encoding is a lossless compression method for text. It aims to reduce the filesize more by assigning the shortest binary codes to the most frequently occuring characters. You start by creating a table with the frequencies of each chacater in the phrase to be compressed. In this example we are compressing the phrase 'ten tennis teens'.

Compressed the phrase is: 1101100011101101000010000001110101100000. This is 40 bits or 8 bytes. In truth the huffman tree would also need to be sent but you don't need to consider that when considering filesize savings. The same phrase in a bit ASCII would be 1 byte per character so 16 bytes. This means we made a saving of 11 bytes or a reduction of 68.75%

Knowledge check

Huffman tree for use with some multi choice questions

Questions:
Correct:

Question text


© All materials created by and copyright S.Goff