After years of working with SQL and optimizing hundreds of poorly written queries, I decided to write this post to encourage use of what I found to be one of the SQL’s most underappreciated features – Common Table Expressions, often known as ‘WITH clause’.

What is a Common Table Expression

In short, CTEs are about taking parts of a SQL query, naming them, and referring to them later. Just like in this example.

WITH some_name AS (
SELECT column_1, max(column_2) as max_2
FROM table_1
GROUP BY column_1
)
SELECT t2.column_3, s.max_2
FROM table_2 t2
JOIN some_name s
ON t2.column_1 = s.column_1

If you need more details, have a look here

1. Readability

How many times have you seen queries like this?

SELECT * FROM (
  SELECT
    column_1,
    max(sum_1) as max_1
  FROM (
    SELECT
      column_1,
      column_2,
      sum(column_3) as sum_1
    FROM table
    GROUP BY column_1, column_2
   )
   GROUP BY column_1
) WHERE max_1 < 3

Obviously, somebody else wrote the query, you have to analyze/modify/fix it, and it is much longer and more complicated than the example above (a few extra joins, dozens of columns etc). So:

  1. For a complex query, it takes you some time before you even figure out where the logic starts
  2. You go crazy as you keep scrolling up and down the editor, because the column names are at the top, and a corresponding WHERE condition is at the bottom, 500 lines down.

Isn’t this version easier to read?

WITH q_1 AS (
SELECT
  column_1,
  column_2,
  sum(column_3) as sum_1
FROM table
GROUP BY column_1, column_2
),
q_2 AS (
SELECT
  column_1,
  max(sum_1) as max_1
FROM q_1
   GROUP BY column_1 
)
SELECT *
FROM q_2
WHERE max_1 < 3

2. Logical flow

In the rewritten query with CTEs the flow goes from the top to the bottom. Furthermore, instead of names like q_1 you can use more meaningful ones like t1_sum (in the first example, subqueries are not named at all). Together with proper commenting, your SQL code will be much better reflecting the business logic.

3. Less code

Let’s have a look the following example:

SELECT
  t1.col_1,
  t2.col_3,
  t3.col_3
FROM
t1
JOIN
(SELECT DISTINCT
 col_2,
 col_3
 FROM xyz
 WHERE col_2 = 2 and col_3 < 15) t2
ON t1.col_2 = t2.col_2
JOIN
(SELECT DISTINCT
 col_2,
 col_3
 FROM xyz
 WHERE col_2 = 2 and col_3 > 22) t3
ON t1.col_2 = t3.col_3

You probably have noticed, that parts of the query repeat. These can be extracted into CTEs and rewritten like this:

WITH dist AS (
SELECT DISTINCT
  col_2,
  col_3
FROM xyz
WHERE col_2 = 2
),
t2 AS (
  SELECT * FROM dist WHERE col_3 < 15
),
t3 AS (
  SELECT * FROM dist WHERE col_2 < 22
)
SELECT
  t1.col_1,
  t2.col_3,
  t3.col_3
FROM
t1
JOIN t2
ON t1.col_2 = t2.col_2
JOIN t3
ON t1.col_2 = t3.col_3

In this particular example, the savings in code length might not be so spectacular, but as you start rewriting your queries, you’ll discover that many repeating parts can be eliminated. The authors of the database you’re working with probably also made some observation about repeating, so please read the next section carefully,

4. Performance and hints

Contrary to what many inexperienced users believe, a SQL statement is not executed ‘as a whole’, but it a series of operations like reads, joins, sorts, unions etc, executed most often sequentially, sometimes in parallel to each other, sometimes in a paralleled way. It depends on the specific database engine how CTEs are processed, but proper use of them is a very powerful tool when tuning SQL queries. I will provide 2 examples how Oracle can take advantage of them:

  1. If a CTE is referred to in the further part of the query more than once, it may get materialized. This means it is saved to a temporary table, so that the preceding operations are not repeated
  2. Using hints, like MATERIALIZE, it is possible to force saving part of a query to a temporary table. This, in turn, influences all the joins downstream, as the temporary table has no indexes, and is hash-joined whenever the join conditions permit

On the other hand, there are engines like Redshift, which do not materialize CTEs. That should not discourage you from using CTEs to structure your code.

5. CTE recursion

Another feature that is database specific is CTE recursion, or otherwise the ability for the query to reference itself, to generate new rows basing on rows so far. The implementation differs between the databases, but if your worked with CONNECT BY queries in Oracle and now moved to MS SQL server, you’re likely to run into them. Google for more.

No more 10 level subqueries

I hope my arguments were convincing, and will help you write cleaner, better SQL queries. Not only for yourself, but also for those who will be debugging or changing them in the future.