Learn Denormalization and Performance Trade-offs | Fundamentals of Relational Database Design

Database Design Patterns

Swipe to show menu

Denormalization is the process of intentionally introducing redundancy into a database by combining data from multiple related tables into a single table. This approach is often used to improve query performance in scenarios where data is frequently accessed together, and where the overhead of joining multiple tables becomes a bottleneck. While normalization aims to eliminate redundancy and maintain data integrity, denormalization accepts some duplication in exchange for faster data retrieval. You might consider denormalization in situations such as reporting, analytics, or when you need to optimize for read-heavy workloads where complex joins are costly.


              1234567891011121314151617181920212223242526272829303132
            
-- Denormalized table structure combining student and course information
CREATE TABLE student_course_info (
    enrollment_id INT PRIMARY KEY,
    student_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    course_id INT,
    course_name VARCHAR(100),
    course_code VARCHAR(20),
    credits INT,
    enrollment_date DATE,
    grade CHAR(2)
);

-- Example of inserting combined data from normalized tables
INSERT INTO student_course_info
SELECT
    e.enrollment_id,
    s.student_id,
    s.first_name,
    s.last_name,
    s.email,
    c.course_id,
    c.course_name,
    c.course_code,
    c.credits,
    e.enrollment_date,
    e.grade
FROM enrollments e
JOIN students s ON e.student_id = s.student_id
JOIN courses c ON e.course_id = c.course_id;

By storing related data together in a denormalized structure, you can reduce the need for joins when retrieving information. This can lead to significant performance benefits, especially for queries that require data from multiple tables. However, denormalization comes with trade-offs. The main drawback is data redundancy: the same information, such as a student's name or a course's details, may be repeated in multiple rows. This increases storage requirements and can make updates more complex, as changes to a student's or course's information must be applied in every row where that data appears. There is also a higher risk of data anomalies, where inconsistent data can occur if updates are not carefully managed. Deciding whether to denormalize involves weighing these risks against the potential performance improvements.


              1234567891011
            
-- Querying a student's course enrollments in a normalized schema
SELECT s.first_name, s.last_name, c.course_name, e.grade
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE s.student_id = 1;

-- Querying the same information from a denormalized table
SELECT first_name, last_name, course_name, grade
FROM student_course_info
WHERE student_id = 1;

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 6

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 6

Denormalization and Performance Trade-offs

1. What is a potential benefit of denormalization?

2. What is a risk associated with denormalized tables?

3. When might denormalization be preferred over normalization?