Normalization
Database Normalization is a systematic process of organizing the columns and tables of a relational database to minimize data redundancy and improve data integrity. It transforms an initial schema into a set of well-structured relations that adhere to specific normal forms.
This guide covers functional dependencies, the progression from 1NF through BCNF, decomposition algorithms, denormalization trade-offs, SQL walkthroughs, and a 10-question practice quiz.
Normalization: Decomposing to 3NF
ENROLLMENT_INSTRUCTOR
Step through to see the normalization process
1Introduction
Database normalization was first formally introduced by E.F. Codd in 1971, building upon his relational model. It is a technique for designing relational database schemas such that the resulting relations satisfy a set of properties concerning the types of functional dependencies, thereby reducing the potential for update anomalies.
The primary motivations for normalization are: minimizing data redundancy (storing the same information multiple times wastes space and complicates management), preventing update anomalies (insertion, deletion, and modification anomalies), improving data integrity (ensuring consistency and accuracy), and simplifying querying (well-normalized tables lead to more logical data organization).
While higher normal forms offer greater data integrity, they often come at the cost of increased join operations. Database designers frequently make pragmatic choices, sometimes opting for 3NF instead of BCNF or even denormalizing specific tables to optimize for read performance in data warehousing or high-transaction environments.
The Three Update Anomalies
Insertion Anomaly
Cannot add a new course without assigning a student to it.
Update Anomaly
Changing a professor's department in one row but not others creates inconsistency.
Deletion Anomaly
Deleting the last student in a course may also delete the course details.

2Key Definitions
Essential terms for understanding normalization at the university level. Notation: R denotes a relation schema, X, Y denote sets of attributes, and F represents a set of functional dependencies.
Functional Dependency (FD)
X → Y: value of X uniquely determines value of Y
Partial Dependency
Non-prime attribute depends on a proper subset of a candidate key
Transitive Dependency
X → Z via X → Y and Y → Z, where Y is not a superkey
Determinant
The left-hand side (X) of an FD X → Y
Candidate Key
Minimal superkey -- no proper subset is also a superkey
Prime Attribute
An attribute that is part of any candidate key
Attribute Closure (X+)
Set of all attributes functionally determined by X under F
Canonical Cover
Minimal equivalent set of FDs: no extraneous attributes, no redundant FDs
Decomposition
Replacing R with smaller relations R1, R2, ... whose union forms R
Lossless-Join
Natural join of decomposed relations exactly reconstructs the original
Dependency Preserving
All original FDs can be checked within individual decomposed relations
Superkey
Set of attributes K such that K → R (determines all attributes)
3First Normal Form (1NF)
A relation is in 1NF if and only if all attribute domains contain only atomic (indivisible) values and there are no repeating groups. This is the basic requirement for a table to be a “relation” in the strict relational model sense. Most modern databases naturally enforce 1NF by not supporting non-atomic column types.
1NF: Step-by-Step Decomposition
Source Table (Non-1NF)
| StudentID | Name | Courses |
|---|---|---|
| 101 | Alice | CS101, MA102, DB201 |
| 102 | Bob | CS101, PH101 |

4Second Normal Form (2NF)
A relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on every candidate key. In other words, there are no partial dependencies -- no non-prime attribute depends on only part of a composite key. 2NF violations only occur when a table has a composite candidate key.
2NF: Step-by-Step Decomposition
Source Table (PROJECT_ASSIGNMENTS)
| ProjectID | EmployeeID | ProjectName | Budget | EmpName | Role |
|---|---|---|---|---|---|
| 101 | 1 | Alpha | 50k | Alice | Developer |
| 101 | 2 | Alpha | 50k | Bob | Tester |
| 102 | 1 | Beta | 75k | Alice | Lead Dev |
5Third Normal Form (3NF)
A relation is in 3NF if it is in 2NF and for every non-trivial FD X → A, at least one of these holds: (1) X is a superkey, (2) A is a prime attribute, or (3) the FD is trivial. Put simply, there are no transitive dependencies of non-prime attributes on any candidate key.
3NF: Step-by-Step Decomposition
Source Table (STUDENT_DEPT)
| StudentID | Name | Major | DeptHead | DeptOffice |
|---|---|---|---|---|
| 1 | Alice | CS | Dr. Smith | ENGR 101 |
| 2 | Bob | CS | Dr. Smith | ENGR 101 |
| 3 | Charlie | EE | Dr. Jones | EE 205 |

6Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if for every non-trivial FD X → Y, X is a superkey. This is stricter than 3NF: while 3NF allows X → A where A is prime and X is not a superkey, BCNF does not permit this exception.
3NF Rule
For X → A: X is a superkey OR A is a prime attribute.
BCNF Rule
For X → Y: X must be a superkey. No exceptions.
BCNF: Step-by-Step Decomposition
Source Table (ENROLLMENT)
| StudentID | Course | Instructor |
|---|---|---|
| 101 | DB101 | Prof. Alice |
| 101 | CS201 | Prof. Bob |
| 102 | DB101 | Prof. Alice |
| 103 | MA101 | Prof. Carol |
BCNF decomposition always guarantees a lossless join but may sacrifice dependency preservation. Some FDs can only be checked by joining tables back together. This is why 3NF is often the practical target -- the 3NF synthesis algorithm guarantees both lossless join and dependency preservation.

7Denormalization
Denormalization is the deliberate introduction of redundancy into a database schema to improve read performance. It is not the absence of normalization -- it is a conscious trade-off made after understanding the normalized design.
When to Denormalize
Good Candidates
- Read-heavy OLAP / data warehouse workloads
- Frequently joined tables that rarely change
- Dashboard queries requiring sub-second response
- Computed aggregates (pre-calculated totals)
Poor Candidates
- Write-heavy OLTP systems
- Tables where data integrity is paramount
- Rapidly changing reference data
- Systems without robust update pipelines
Read vs. Write Trade-offs
| Aspect | Normalized | Denormalized |
|---|---|---|
| Read Speed | Slower (joins needed) | Faster (fewer joins) |
| Write Speed | Faster (single update) | Slower (multiple updates) |
| Data Integrity | Higher (no redundancy) | Lower (anomaly risk) |
| Storage | Less | More |
| Complexity | Simpler writes | Simpler reads |
8SQL Walkthroughs
Normalization to 2NF: Project-Employee Assignment
-- Unnormalized Table (1NF)
CREATE TABLE PROJECT_ASSIGNMENTS (
ProjectID INT,
ProjectName VARCHAR(100),
ProjectBudget DECIMAL(10, 2),
EmployeeID INT,
EmployeeName VARCHAR(100),
EmployeeRole VARCHAR(100),
PRIMARY KEY (ProjectID, EmployeeID)
);
-- Normalized to 2NF: extract partial dependencies
CREATE TABLE PROJECTS (
ProjectID INT PRIMARY KEY,
ProjectName VARCHAR(100),
ProjectBudget DECIMAL(10, 2)
);
CREATE TABLE EMPLOYEES (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100)
);
CREATE TABLE PROJECT_EMPLOYEE_ROLES (
ProjectID INT,
EmployeeID INT,
EmployeeRole VARCHAR(100),
PRIMARY KEY (ProjectID, EmployeeID),
FOREIGN KEY (ProjectID) REFERENCES PROJECTS(ProjectID),
FOREIGN KEY (EmployeeID) REFERENCES EMPLOYEES(EmployeeID)
);Lines 2-10: Original table with composite key (ProjectID, EmployeeID). ProjectName and Budget depend only on ProjectID (partial dependency).
Lines 13-17: PROJECTS table captures the partial dependency ProjectID → ProjectName, ProjectBudget.
Lines 25-32: PROJECT_EMPLOYEE_ROLES retains the full dependency on the composite key.
Key insight: 2NF eliminates partial dependencies on composite candidate keys by extracting dependent attributes into new tables.
Normalization to 3NF: Student-Major-Department
-- Table in 2NF (not 3NF due to transitive dependency)
CREATE TABLE STUDENT_MAJOR_DEPT (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100),
Major VARCHAR(100),
DeptHead VARCHAR(100), -- Transitively dependent
DeptOffice VARCHAR(50) -- Transitively dependent
);
-- Normalized to 3NF: extract transitive dependency
CREATE TABLE DEPARTMENTS (
Major VARCHAR(100) PRIMARY KEY,
DeptHead VARCHAR(100),
DeptOffice VARCHAR(50)
);
CREATE TABLE STUDENTS (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100),
Major VARCHAR(100),
FOREIGN KEY (Major) REFERENCES DEPARTMENTS(Major)
);
-- Reconstruct original data with JOIN
SELECT S.StudentID, S.StudentName, S.Major,
D.DeptHead, D.DeptOffice
FROM STUDENTS S
JOIN DEPARTMENTS D ON S.Major = D.Major;Lines 6-7: DeptHead and DeptOffice depend on Major, not directly on StudentID -- a transitive dependency.
Lines 11-15: DEPARTMENTS captures Major → DeptHead, DeptOffice. Major becomes the primary key.
Lines 25-29: A JOIN reconstructs the full original data from the normalized tables.
Key insight: 3NF eliminates transitive dependencies by extracting transitively dependent attributes and their determinant into a new table.
Normalization to BCNF: Enrollment-Instructor (3NF but not BCNF)
-- 3NF but NOT BCNF: Instructor -> Course violates BCNF
CREATE TABLE ENROLLMENT_INSTRUCTOR (
StudentID INT,
Course VARCHAR(100),
Instructor VARCHAR(100),
PRIMARY KEY (StudentID, Course)
-- Candidate keys: (StudentID, Course), (StudentID, Instructor)
-- FD: Instructor -> Course (Instructor is NOT a superkey)
);
-- BCNF decomposition on Instructor -> Course
CREATE TABLE INSTRUCTOR_COURSE (
Instructor VARCHAR(100) PRIMARY KEY,
Course VARCHAR(100)
);
CREATE TABLE STUDENT_INSTRUCTOR (
StudentID INT,
Instructor VARCHAR(100),
PRIMARY KEY (StudentID, Instructor),
FOREIGN KEY (Instructor)
REFERENCES INSTRUCTOR_COURSE(Instructor)
);
-- Note: (StudentID, Course) -> Instructor is NO LONGER
-- directly enforceable (dependency preservation lost)Lines 2-9: All attributes are prime (part of some candidate key), so the table is vacuously 3NF. But Instructor → Course violates BCNF.
Lines 12-15: INSTRUCTOR_COURSE captures the violating FD. Instructor is now a superkey in this table.
Lines 25-26: The original FD (StudentID, Course) → Instructor cannot be checked without a join -- dependency preservation is lost.
Key insight: BCNF eliminates all redundancy from non-superkey determinants, but may sacrifice dependency preservation.
9Memory Aids
“The key, the whole key, and nothing but the key -- so help me Codd.”
1NF = “the key” (unique identifier). 2NF = “the whole key” (no partial deps). 3NF = “nothing but the key” (no transitive deps).
“1NF: atomic values. 2NF: no partial. 3NF: no transitive. BCNF: every determinant is a key.”
“Never on the right, always in the key.”
Attributes that never appear on the RHS of any FD must be part of every candidate key.
“3NF forgives prime attributes. BCNF forgives nothing.”
“The common columns must be a key for at least one side.”
For binary decomposition R into R1 and R2: lossless if R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
“I-U-D: Insert, Update, Delete -- the three anomalies redundancy breeds.”
10Common Mistakes
Thinking any non-key dependency violates 2NF
2NF specifically targets partial dependencies on composite keys. If you have a single-attribute primary key, 2NF is automatically satisfied. Transitive dependencies (non-key → non-key) are a 3NF issue.
Only checking against the primary key for normalization
A relation can have multiple candidate keys. Partial and transitive dependencies must be checked against every candidate key, not just the one chosen as primary.
Blindly decomposing to BCNF without considering dependency preservation
BCNF may lose dependency preservation, meaning some constraints can only be enforced via expensive joins. In practice, 3NF with the synthesis algorithm guarantees both lossless join and dependency preservation.
Decomposing without verifying the lossless-join property
A lossy decomposition generates spurious tuples when tables are joined back. Always verify: the common attributes of the two decomposed tables must form a superkey for at least one of them.
Incorrectly classifying attributes when checking 3NF vs BCNF
A prime attribute is part of any candidate key. The distinction between prime and non-prime is critical for understanding why a relation can be in 3NF but not BCNF. Misclassifying attributes leads to incorrect normalization conclusions.
Splitting every table to 5NF without considering query performance
Excessive normalization creates many small tables requiring numerous joins. For read-heavy workloads, this kills performance. Normalize to 3NF/BCNF for correctness, then selectively denormalize based on actual query patterns.
Computing attribute closure without applying transitivity fully
The attribute closure algorithm must iterate until no new attributes are added. A common exam mistake is stopping after one pass when more attributes could be derived through transitivity in subsequent passes.
Frequently Asked Questions
- When should I stop normalizing a database?
- In practice, most production databases are normalized to Third Normal Form (3NF) or Boyce-Codd Normal Form (BCNF). Going beyond BCNF (to 4NF or 5NF) is rarely necessary unless you have specific multi-valued or join dependencies. The right level depends on balancing data integrity requirements against query performance. If heavy read workloads dominate, you may stop at 3NF and selectively denormalize.
- What is the difference between BCNF and 3NF?
- Both eliminate transitive dependencies, but BCNF is stricter. 3NF allows a non-trivial FD X -> A where A is a prime attribute and X is not a superkey. BCNF requires that for every non-trivial FD X -> Y, X must be a superkey -- no exceptions. A relation can be in 3NF but not BCNF when there are overlapping candidate keys with a determinant that is not a superkey but determines a prime attribute.
- What are the trade-offs of denormalization?
- Denormalization improves read performance by reducing joins, which benefits reporting, dashboards, and read-heavy OLAP workloads. However, it reintroduces data redundancy, increases storage costs, and makes write operations more complex -- you must update the same fact in multiple places. The risk of update anomalies returns. Denormalization is a deliberate engineering trade-off, not a shortcut.
- Can BCNF decomposition lose dependency preservation?
- Yes. While BCNF decomposition always guarantees a lossless join (no spurious tuples), it does not always guarantee dependency preservation. Some functional dependencies may only be enforceable by joining decomposed tables back together, which is costly. This is a key reason why 3NF is sometimes preferred -- the 3NF synthesis algorithm guarantees both lossless join and dependency preservation.
- What is a functional dependency and why does it matter?
- A functional dependency X -> Y means that for any two tuples with the same value of X, they must have the same value of Y. FDs are the foundation of normalization -- they define which attributes determine others. By analyzing FDs, we identify candidate keys, detect partial and transitive dependencies, and determine what normal form a relation is in. Normalization is essentially the process of decomposing relations to eliminate problematic FDs.
- How do I find candidate keys from a set of functional dependencies?
- Use the attribute closure algorithm. For a set of attributes X, compute X+ (all attributes determined by X under the given FDs). If X+ equals all attributes of the relation, X is a superkey. A candidate key is a minimal superkey -- no proper subset of it is also a superkey. Start with attributes that never appear on the right-hand side of any FD (they must be in every key), then try adding other attributes until you find all minimal superkeys.
Practice Quiz
Test your understanding of database normalization — select the correct answer for each question.
1.Which of the following describes an update anomaly in a database?
2.In the context of relational databases, what does the functional dependency X -> Y signify?
3.What is the primary purpose of the attribute closure algorithm (X+) in database normalization?
4.A relation is in First Normal Form (1NF) if and only if:
5.A partial dependency exists when a non-key attribute is functionally dependent on:
6.A relation is in Third Normal Form (3NF) if it is in 2NF and:
7.What is the primary condition for a relation to be in Boyce-Codd Normal Form (BCNF)?
8.Under what circumstances might a designer choose to keep a relation in 3NF rather than decompose it further into BCNF?
9.What does a "lossless-join decomposition" ensure in database normalization?
10.Why is "dependency preservation" an important property for a decomposition in database design?
Study Tips
- Practice closure computation: Given a set of FDs, compute X+ by hand for different attribute sets. This is the foundation for finding keys and checking normal forms.
- Draw FD diagrams: Visualize functional dependencies as arrows between attributes to spot partial and transitive dependencies quickly.
- Work through decomposition: Take an unnormalized table and manually decompose it step by step to 3NF and BCNF, verifying lossless join at each step.
- Compare 3NF vs BCNF on tricky examples: Find relations that are 3NF but not BCNF (overlapping candidate keys) to solidify the distinction.