AS/A Computer Science: Fundamentals of data structures

Hash tables

Hash tables provide a way to store a collection of items so that they are easily and quickly accessible. To achieve this a hash table stores an index of the physical address where the data is stored as a key that maps directly to the value stored in that location. This index is created using a hash function.

Hash functions

A hash function is a method used to convert data to an address. In order for a hash table to work effectively, we need to consider it's load factor. The load factor is determined by the number of spaces allocated for storage and the total number of items that need to be stored. There needs to be significantly more space allocated than there are items to store in order to avoid a large number of collisions. Collisions make the process of storing and retrireving data from hash tables slower.

Methods for hashing

Hashing is frequently done in two steps:

Generate a hash
Use the hash to calculate an index in the hash table

Generating a hash with the folding method

One method that can be used to generate a hash is the folding method. This would break a number up into a series of groups of digits and then add these to provide a hash.
Lets consider an ISBN which is made up of 12 digits followed by a check digit which we will ignore, split into groups of 2 digits. This gives us 97 + 83 + 16 + 14 + 84 + 10 which is a hash value of 304.

Generating a hash from strings

A simple method that can be used to hash strings is to add the ASCII value of each of the characters in a string. "String" = 83 + 116 + 114 + 105 +110 +103 = 631

Calculating an index

One way to calculate an index is to divide the hash value by the number of available spaces in the hash table and return the remainder i.e. use modulus. If we were trying to insert our ISBN from earlier into a hash table with 100 spaces then it would be 304 MOD 100 which is 4.

Collisions

If two items result in the same index then we have a collision. An item that shares an index with another is a synonym. If we had another ISBN that resulted in a hash value of 404 this would be a synonym of our earlier ISBN as it would also return an index of 4.

Collision resolution

Any hashing algorithm could result in collisions and so there must be a way to resolve these. Calculating a new location when there is a collision is called rehashing. A basic way to handle rehashing is to place the item in the next available free space if the space is already occupied. Alternately a set increment or even a variable increment might be used.

Searching a hash table

To search a hash table:

Hash the key you are searching for and calculate an index
Check what is stored in that location
If the location is empty then the item is not in the hash table

Hash tables and Python dictionaries

Python dictionaries are implemented as hash tables. This is an example of functional abstraction as you don't have to know or understand hash tables to use them, just the methods available to python dictionaries.

Knowledge check

Questions:

Correct:

Question text