Databases

Normalisation

Normalisation is a process we go through when designing a database in order to make it as efficient as possible. It ensures there is no data duplication. This is where the same piece of information is recorded in more than one place in the database. Reducing data duplication reduces the size of the database. The fact data is not duplicated means it will always be consistent. If we re-enter the same data more than once the chances of introducing a transcription error increase. This could lead to us having inconsistent data e.g. a misspelled surname.

Normalisation example

Let's have a look at normalisation using an example. We start with a flat-file database as you see below.



1st normal form

A table is in first normal form if it contains no repeating attributes or groups of attributes. To make this work we have to split our table in two and create a link table. We now have no repeating attributes and so are in the first normal form.

tblStudents(StudentID, Student, DOB)
tblCourses(CourseID, Course, TeacherID, Teacher name)
tblStudentStudies(StudentID, CourseID)



2nd normal form

A database is in second normal form if it is in 1st normal form and there are no partial key dependencies. Partial key dependencies can only happen in a table with a composite key like our link table. An attribute is a partial key dependency if it relies only on part of the composite key.
In our example we have no other fields in our link table which is the only one with a composite key and so we are already in 2nd normal form. If I were to add an enrolment date, it would go in tblStudentStudies because it relies on both the studentID and the courseID as they might enrol in different courses on different dates. However if I were to add a student email address to tblStudentStudies then it would not be in 2nd normal form because it relates only to the studentID and not at all to courseID. This means it should go in tblStudents.

3rd normal form

A database is in 3rd normal form if it is in 2nd normal form and has no non-key dependencies. A non-key dependency is an attribute that is reliant on another attribute in the table and not the primary key. In tblCourses the attribute Teacher Name relates to the TeacherID and not the CourseID. Therefore it needs to be removed to it's own table leaving us with:

tblStudents(StudentID, Student, DOB)
tblCourses(CourseID, Course, TeacherID)
tblTeachers(TeacherID, Teacher name)
tblStudentStudies(StudentID, CourseID)



Referential integrity

In a normalised database you will have a number of link tables. These represent one to many relationships. Enforcing referential integrity means applying rules to the database that prevent you from removing a record from the one side of the one to many relationship if it is used in an entry in the link table e.g. We could not remove a student while they have an enrolment in a course.

Knowledge check


Questions:
Correct:

Question text


© All materials created by and copyright S.Goff