SQL - Common Table Expression (CTE)

Hello there, future SQL wizards! Today, we're going to embark on an exciting journey into the world of Common Table Expressions, or CTEs for short. Don't worry if you're new to programming – I'll guide you through this step-by-step, just like I've done for countless students over my years of teaching. So, grab your favorite beverage, get comfortable, and let's dive in!

SQL - Common Table Expression

The SQL Common Table Expression

Imagine you're organizing a big party (because who doesn't love a good analogy?). You've got a list of tasks, and you want to break them down into smaller, manageable chunks. That's essentially what a Common Table Expression does in SQL – it helps us break down complex queries into simpler, more readable parts.

A CTE is like a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, DELETE, or MERGE statement. It's defined within the execution scope of a single statement. Think of it as creating a temporary table that exists just for your query.

Let's look at a simple example:

WITH cte_example AS (
    SELECT 'Hello, CTE!' AS greeting
)
SELECT greeting FROM cte_example;

In this example:

  1. We start with the WITH keyword, which signals the beginning of our CTE.
  2. We give our CTE a name: cte_example.
  3. We define what our CTE will contain: in this case, a simple SELECT statement that creates a column called 'greeting' with the value 'Hello, CTE!'.
  4. After the CTE definition, we have our main query that uses the CTE.

When you run this, you'll see:

| greeting    |
|-------------|
| Hello, CTE! |

Isn't that neat? We've just created our first CTE!

The MySQL WITH Clause

Now, let's talk about the WITH clause in MySQL. It's the magic wand that brings our CTEs to life. The general syntax looks like this:

WITH cte_name [(column_list)] AS (query)
SELECT * FROM cte_name;

Here's a more practical example. Let's say we have a table of employees:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees VALUES
(1, 'Alice', 'HR', 50000),
(2, 'Bob', 'IT', 60000),
(3, 'Charlie', 'Finance', 55000),
(4, 'David', 'IT', 65000),
(5, 'Eve', 'HR', 52000);

Now, let's use a CTE to find the average salary by department:

WITH dept_avg_salary AS (
    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
)
SELECT * FROM dept_avg_salary
ORDER BY avg_salary DESC;

This query will give us:

| department | avg_salary |
|------------|------------|
| IT         | 62500.00   |
| Finance    | 55000.00   |
| HR         | 51000.00   |

Here, we used a CTE to calculate the average salary for each department, and then we selected from this CTE to display the results. It's like we created a temporary table with the average salaries, which we then used in our main query.

CTE from Multiple Tables

CTEs aren't limited to just one table. We can use multiple tables in our CTE definitions, just like in regular queries. Let's add a departments table to our example:

CREATE TABLE departments (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    location VARCHAR(50)
);

INSERT INTO departments VALUES
(1, 'HR', 'New York'),
(2, 'IT', 'San Francisco'),
(3, 'Finance', 'Chicago');

Now, let's use a CTE to combine information from both tables:

WITH dept_info AS (
    SELECT e.department, 
           AVG(e.salary) AS avg_salary, 
           d.location
    FROM employees e
    JOIN departments d ON e.department = d.name
    GROUP BY e.department, d.location
)
SELECT * FROM dept_info
ORDER BY avg_salary DESC;

This will give us:

| department | avg_salary | location      |
|------------|------------|---------------|
| IT         | 62500.00   | San Francisco |
| Finance    | 55000.00   | Chicago       |
| HR         | 51000.00   | New York      |

In this example, our CTE joins the employees and departments tables, calculates the average salary, and includes the location information.

Recursive CTE

Now, here's where things get really interesting – recursive CTEs! These are like the Russian nesting dolls of the SQL world. A recursive CTE refers to itself, allowing you to work with hierarchical or tree-structured data.

Let's create a simple example of an employee hierarchy:

CREATE TABLE employee_hierarchy (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    manager_id INT
);

INSERT INTO employee_hierarchy VALUES
(1, 'Big Boss', NULL),
(2, 'Manager A', 1),
(3, 'Manager B', 1),
(4, 'Employee 1', 2),
(5, 'Employee 2', 2),
(6, 'Employee 3', 3);

Now, let's use a recursive CTE to display the entire hierarchy:

WITH RECURSIVE emp_hierarchy AS (
    SELECT id, name, manager_id, 0 AS level
    FROM employee_hierarchy
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.id, e.name, e.manager_id, eh.level + 1
    FROM employee_hierarchy e
    JOIN emp_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM emp_hierarchy
ORDER BY level, id;

This query will produce:

| id | name       | manager_id | level |
|----|------------|------------|-------|
| 1  | Big Boss   | NULL       | 0     |
| 2  | Manager A  | 1          | 1     |
| 3  | Manager B  | 1          | 1     |
| 4  | Employee 1 | 2          | 2     |
| 5  | Employee 2 | 2          | 2     |
| 6  | Employee 3 | 3          | 2     |

This recursive CTE starts with the top-level employee (Big Boss) and then recursively finds all employees who report to each manager.

Advantages of CTE

CTEs come with several advantages:

Advantage Description
Readability CTEs make complex queries more readable by breaking them into named subqueries.
Reusability You can reference a CTE multiple times within a query.
Recursion CTEs allow you to write recursive queries, which are great for hierarchical data.
Simplification They can simplify complex joins and subqueries.
Maintenance CTEs make queries easier to maintain and modify.

Disadvantages of CTE

While CTEs are powerful, they do have some limitations:

Disadvantage Description
Performance In some cases, CTEs might not perform as well as derived tables or views.
Scope CTEs are only valid within the scope of the single statement they're defined in.
Complexity For very simple queries, using a CTE might add unnecessary complexity.
Database Support Not all database systems support CTEs, although most modern ones do.

And there you have it, folks! We've journeyed through the land of Common Table Expressions, from basic concepts to recursive queries. Remember, like learning any new skill, mastering CTEs takes practice. So don't be afraid to experiment with them in your own queries. Before you know it, you'll be writing CTEs like a pro, impressing your colleagues and making your database queries cleaner and more efficient. Happy querying!

Credits: Image by storyset