SQL provides a construct called a CTE (Common Table Expression). With CTEs you can write a query and give it an alias which can be joined. I like to think of CTEs like inline-views.
CTEs can eliminate subqueries from a single SELECT
turning them into simpler joins. Should you rewrite subqueries into CTEs? Will it impact performance? Let's find out.
The examples will come from the classic Sakila database. In this case I am using PostgreSQL so it is technically the Pagila database. Use either of those to follow along.
Requirements
The rental store owner wants a view in a dashboard showing:
- All customers who've rented a movie shorter than an hour
- The customer's lifetime value (sum of all their payments)
Subquery Version
SELECT
c.customer_id
, ps.amount
FROM
customer c
JOIN (
SELECT
customer_id
, SUM(amount) AS amount
FROM
payment
GROUP BY customer_id
) ps ON c.customer_id = ps.customer_id
WHERE
EXISTS (
SELECT 1
FROM
rental r
JOIN inventory i ON r.inventory_id = i.inventory_id
JOIN film f ON i.film_id = f.film_id
WHERE
r.customer_id = c.customer_id
AND f.length < 60
)
CTE Version
WITH payment_sum AS (
SELECT
customer_id
, SUM(amount) AS amount
FROM
payment
GROUP BY customer_id
)
, movie_rentals_under_an_hour AS (
SELECT
r.customer_id
FROM
rental r
JOIN inventory i ON r.inventory_id = i.inventory_id
JOIN film f ON i.film_id = f.film_id
WHERE
f.length < 60
GROUP BY r.customer_id
)
SELECT
c.customer_id
, ps.amount
FROM
customer c
JOIN movie_rentals_under_an_hour r ON c.customer_id = r.customer_id
JOIN payment_sum ps ON c.customer_id = ps.customer_id
Both of these queries produce the correct results. Note we are just returning the minimum data for brevity. No firstname, lastname or other display data.
Styles
Stylistically I prefer the CTE version for a few reasons.
Explicit Naming of Concepts
In the CTE version, each requirement is broken out and given a name. The requirements are not strewn out, unlabeled in the whole query.
The subquery version could use a comment to label the JOIN and the subquery that apply to each requirement. But I find that comments are more likely go to stale than a CTE would be.
Rentals under one hour is a conceptual section and it can be given a name with the CTE. Naming things is hard but not naming things is worse.
Hints For Re-use
It's a sad state when duplicate business logic is littered across stored procedures and queries. It's a huge liability for bugs.
Less than hour long rentals could be a domain concept that applies to other features. Maybe a view is warrented to avoid logic duplication, maybe not. It's a judgement call and I would not advocate for premature abstraction. However, the existence of the CTE with a defined responsibility can help show when it's time for re-use.
Readability
We've got two simple requirements in our example. It's reasonable to understand what's going on by taking in the whole thing. But what if there was more to it? What if there were 4 subqueries and 3 complex joins? It can become difficult to grok.
The CTE version starts to shine the more complicated things get. The complexity of each statement and indentation can be capped.
Performance
You can't speculate on performance. You have to measure.
I used PostgreSQL's Explain
to find the cost of these queries. I've captured the two plans and put them in a graphical viewer. (Click or tap to see the images in full screen and zoom.)
Subqueries Plan
CTEs Plan
These plans are nearly identical. You should be able write a subquery as a CTE and get pretty much the same plan, with small differences.
But, as always with performance, you have to measure. That is the only way to know for sure if it's the same, better, or worse. When you measure, look at the cost not the duration.
I tend to spot performance improvements more easily in CTEs, such as adding a missing SARG-able parameter. I attribute this to CTEs being broken down into smaller pieces.
What cannot be concluded is CTEs will perform better than subqueries or vice versa.
Is It Just Preference?
The benefits of CTEs are in improving readability and maintainability. Is readability in the eye of the beholder? It's somewhere between objective and subjective. You could certainly write unreadable code with CTEs or with subqueries.
My Conclusion
When I write with CTEs, I find I'm forced to be more considerate, to give names and intent to things. Names and intent improve prose. That definitely has value.
I'm going to use and advocate for CTEs over multiple, large subqueries.
Resources
Use PostgreSQL Explain To Measure
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
SELECT * FROM -- ...