Or their own tables or views.

aidos · on July 25, 2023

Adding tables doesn’t sound like a solution at all and views come with operational overheads that you need to be aware of.

CTEs should be the first thing you reach for when trying to clarify the intent of the way you’re chopping up your data.

mulmen · on July 25, 2023

Depends on the situation. Creating tables is a space/CPU tradeoff. Views can be managed the same way as the rest of your data pipeline.

CREATE TEMPORARY VIEW and CREATE TEMPORARY TABLE are both valid SQL.

> CTEs should be the first thing you reach for when trying to clarify the intent of the way you’re chopping up your data.

Sure but they aren't a silver bullet. Hence the "or" in my comment.

I use Redshift a lot and temporary tables are useful there for defining the sort and distribution keys when those are not favorable to the current query. Think joining several demographic tables into a single customer dimension, distributing it on the customer key then filtering the product_id distributed sales and re-distributing it on customer before joining. You can't do that with a CTE.

Views are handy when you have ugly CASE statements or other derived logic and need a convenience column that does something like string concatenation. I have a whole repository of them I deploy on top of the default system views for simplifying admin tasks.

If you are struggling to keep your SQL clean it's because you are doing too much at once. Take a step back and re-evaluate your data model. If your physical model is bad then no number of CTEs will save you.

aidos · on July 25, 2023

CTEs are basically temporary views and tables.

I’d argue that the extra tables with duplicate data aren’t a space / cpu trade-off as much as they are a data management trade-off. There are cases for duplicating data in the system but I’d leave that to indexes and generated columns unless you really need to materialise it in a different structure.

mulmen · on July 25, 2023

> CTEs are basically temporary views and tables.

Not in any system I have ever used. CTEs aren't materialized so they're not tables.

Tables (or materialized views if you prefer) are literally space/CPU tradeoffs. I have a data pipeline that combines website logs into a table before I join it with product data. If you tried to do both in one query then the DB falls over.

You seem to be coming at this from a OLTP perspective but in OLAP data "duplication" is the name of the game. But what naively looks like wasteful duplication is actually creating coherent analytical models from disparate data sources. This leads to much simpler analytical queries.

aidos · on July 25, 2023

I agree with your further example cases (they were added after my comment).

I was thinking more of the original post where someone was complaining about readability when you have joins and subqueries - which is the usecase for CTEs.

mulmen · on July 25, 2023

Yeah sorry I edit a lot. I'll bump up my delay.

Tables, views, and CTEs are all tools that can be used to make SQL more readable. They are all valid alternatives to subqueries in JOINs which is the only thing I would say you should "never" do.

I tend to use all of them. I create tables with an optimized (typically star) schema for my purposes. My date dimensions are almost always views on top of a list of dates. I also use views to create a ~"feature store" that pre-joins the underlying star schema. I can then write "simple" analytical queries that utilize CTEs. Those CTEs tend to not have joins and they tend to only apply relevant filters so the final select is clear and concise.

aidos · on July 26, 2023

> CTEs aren't materialized so they're not tables.

You might well know this, but one thing thats worth pointing out is that in Postgres you _can_ materialise the CTEs (though it's more to trick the planner when your use case is more about making sure everything to exploded and indexed in advance).

    with x as materialized (select ..) select ...

slt2021 · on July 26, 2023

neat trick!

sometimes you absolutely need to materialize your CTE, especially if it is one big computation and you refer it 4-5 times downstream.

mulmen · on July 26, 2023

According to the docs [1] this is not necessary:

> All queries in the WITH list are computed. These effectively serve as temporary tables that can be referenced in the FROM list. A WITH query that is referenced more than once in FROM is computed only once, unless specified otherwise with NOT MATERIALIZED.

[1]: https://www.postgresql.org/docs/15/sql-select.html