Data Representation: Dictionary based compression

Dictionary based compression

Dictionary-based compression works by creating a dictionary(table of values) from which the text can be re-created. Below is a table of the value needed to compress the phrase 'An eye for an eye, a tooth for a tooth'.

  Entry     Text     Binary  
1 An 000
2 _eye 001
3 _for 010
4 _an 011
5 , 100
6 _a 101
7 _tooth 110

'An eye for an eye, a tooth for a tooth' is compressed as:

000 001 010 011 001 100 101 110 010 101 110


Savings from dictionary based compression

'An eye for an eye, a tooth for a tooth' is compressed as: 000 001 010 011 001 100 101 110 010 101 110

This has a total size of 33 bits or 4.125 bytes. The phrase has 38 characters, so in 8-bit ASCII would take up 38 bytes.

This is a saving of (38 - 4.125)/38 * 100 = 89.15%. The saving would actually be lower than this as the dictionary itself also has to be transmitted.

It may seem from this small example that the size of the dictionary would eat p mst of the savings, but applied to a larger text the size of the dictionary becomes far smaller than the size of the text being compressed.

Knowledge check

These questions will refer to using dictionary encoding to enocde the phrase 'Can you can a can like a canner can?' using the dictionary below

  Entry     Text     Binary  
1 Can 000
2 _you 001
3 _can 010
4 _a 011
5 _like 100
6 ner 101
7 ? 110


Questions:
Correct:

Question text


© All materials created by and copyright S.Goff