DBMS - Database Normalization

Hello, aspiring database enthusiasts! Today, we're going to embark on an exciting journey through the world of Database Normalization. As your friendly neighborhood computer science teacher, I'm thrilled to guide you through this essential concept in database design. Don't worry if you're new to programming – we'll start from the basics and work our way up!

DBMS - Database Normalization

Introduction to Database Normalization

Imagine you're organizing your closet. You wouldn't just throw all your clothes into one big pile, would you? Of course not! You'd organize them by type, color, or season. Database normalization is quite similar – it's all about organizing data efficiently and reducing redundancy.

Functional Dependency

Let's start with a fundamental concept: Functional Dependency.

What is Functional Dependency?

Functional Dependency (FD) is a relationship between two attributes in a relation, where one attribute determines the other.

Imagine you have a student database:

Student_ID Name Age Course
101 Alice 20 Math
102 Bob 21 Physics
103 Charlie 19 Chemistry

Here, Student_ID functionally determines Name, Age, and Course. We write this as:

Student_ID → Name, Age, Course

This means if we know the Student_ID, we can determine the Name, Age, and Course of the student.

Armstrong's Axioms

Now, let's talk about Armstrong's Axioms. These are the fundamental rules for functional dependencies. Think of them as the "laws of physics" for database relationships!

  1. Reflexivity: If Y is a subset of X, then X → Y
  2. Augmentation: If X → Y, then XZ → YZ
  3. Transitivity: If X → Y and Y → Z, then X → Z

These might sound complicated, but they're just formal ways of describing logical relationships. We'll see how they apply as we go along!

Trivial Functional Dependency

A trivial functional dependency is when a right-hand side attribute is a subset of the left-hand side. For example:

Student_ID, Name → Student_ID

This is trivial because knowing Student_ID and Name will always tell you Student_ID (it's already there!).

Normalization

Now, let's dive into the heart of our lesson: Normalization. Normalization is the process of organizing data to minimize redundancy and dependency. It's like decluttering your database!

First Normal Form (1NF)

First Normal Form is the basic level of normalization. To achieve 1NF:

  1. Eliminate repeating groups
  2. Create a separate table for each set of related data
  3. Identify each set of related data with a primary key

Let's look at an example:

Before 1NF:

Order_ID Product Quantity
1 Apple, Banana 2, 3
2 Orange, Grape 1, 4

After 1NF:

Order_ID Product Quantity
1 Apple 2
1 Banana 3
2 Orange 1
2 Grape 4

See how we've eliminated the repeating groups? Much cleaner!

Second Normal Form (2NF)

To achieve 2NF, we need to:

  1. Be in 1NF
  2. Remove partial dependencies

Let's look at an example:

Student_ID Course_ID Course_Name Instructor
101 C1 Math Prof. Smith
102 C2 Physics Prof. Johnson

Here, Course_Name and Instructor depend on Course_ID, not on Student_ID. We should split this into two tables:

Table 1: Student_Course

Student_ID Course_ID
101 C1
102 C2

Table 2: Course

Course_ID Course_Name Instructor
C1 Math Prof. Smith
C2 Physics Prof. Johnson

Third Normal Form (3NF)

To achieve 3NF, we need to:

  1. Be in 2NF
  2. Remove transitive dependencies

Let's look at an example:

Employee_ID Department Department_Head
101 Sales John
102 Marketing Sarah

Here, Department_Head depends on Department, which depends on Employee_ID. We should split this:

Table 1: Employee

Employee_ID Department
101 Sales
102 Marketing

Table 2: Department

Department Department_Head
Sales John
Marketing Sarah

Boyce-Codd Normal Form (BCNF)

BCNF is a slightly stronger version of 3NF. To be in BCNF:

  1. Be in 3NF
  2. For every dependency X → Y, X should be a super key

Here's an example:

Student Subject Professor
Alice Math Prof. Smith
Bob Physics Prof. Johnson

In this case, both (Student, Subject) → Professor and Professor → Subject. To achieve BCNF, we split it:

Table 1: Student_Professor

Student Professor
Alice Prof. Smith
Bob Prof. Johnson

Table 2: Professor_Subject

Professor Subject
Prof. Smith Math
Prof. Johnson Physics

And there you have it! We've covered the basics of database normalization. Remember, while normalization is important, sometimes denormalization can be useful for performance reasons. It's all about finding the right balance for your specific needs.

I hope this guide has been helpful. Remember, practice makes perfect! Try normalizing some sample databases on your own. If you have any questions, don't hesitate to ask. Happy normalizing!

Credits: Image by storyset