Data Representation: Dictionary based compression

Dictionary based compression

Dictionary-based compression works by creating a dictionary(table of values) from which the text can be re-created. Below is a table of the value needed to compress the phrase 'An eye for an eye, a tooth for a tooth'.

Entry	Text	Binary
1	An	000
2	_eye	001
3	_for	010
4	_an	011
5	,	100
6	_a	101
7	_tooth	110

'An eye for an eye, a tooth for a tooth' is compressed as:

000 001 010 011 001 100 101 110 010 101 110

Savings from dictionary based compression

'An eye for an eye, a tooth for a tooth' is compressed as: 000 001 010 011 001 100 101 110 010 101 110

This has a total size of 33 bits or 4.125 bytes. The phrase has 38 characters, so in 8-bit ASCII would take up 38 bytes.

This is a saving of (38 - 4.125)/38 * 100 = 89.15%. The saving would actually be lower than this as the dictionary itself also has to be transmitted.

It may seem from this small example that the size of the dictionary would eat p mst of the savings, but applied to a larger text the size of the dictionary becomes far smaller than the size of the text being compressed.

Knowledge check

These questions will refer to using dictionary encoding to enocde the phrase 'Can you can a can like a canner can?' using the dictionary below

Entry	Text	Binary
1	Can	000
2	_you	001
3	_can	010
4	_a	011
5	_like	100
6	ner	101
7	?	110

Questions:

Correct:

Question text