"many times I as a human just knew better than the DB about how many rows would ...

nrdvana · on April 6, 2021

(not the OP but...) I have had 3 cases in the last year where a postgres instance with less than millions of rows per table has decided to join with fancy hash algorithms that result in tens of seconds per query instead of the 5ms that it would take when it uses nested loops (i.e. literally start with the table in the from clause, apply some where clause, join to next table, apply more where clause, join to next table, and so on)

I do believe the planner was coming up with vast mis-estimates in some of those cases. 2 of the 3 were cases where the fully joined query would have been massive, but we were displaying it in a paged interface and only wanted 100 rows at a time.

One was a case where I was running a “value IN (select ...)” subquery where the subquery was very fast and returned a very small number of rows, but postgres decided to be clever and merge that subquery into the parent. I fixed that one by running two separate queries, plugging the result of the first into the second.

For one of the others, we actually had to re-structure the table and use a different primary key that matched the auto-inc id column of its peer instead of using the symbolic identifier (which was equally indexed). In that case we were basically just throwing stuff at the wall to see what sticks.

I have no idea what we’d do if one of these problems just showed up suddenly in production, which is kind of scary.

I’m sure the postgres optimizer is doing nice things for us in places of the system that we don’t even realize, but I’m sorely tempted to just find some way to disable it entirely and live with whatever performance we get from nested loops. Our data is already structured in a way that matches our access patterns.

The most frustrating part of it all is how much time we can waste fighting the query planner when the solution is so obvious that even sqlite could handle it faster.

For context, I’ve only been using postgres professionally for about a year, having come from mysql, sql server, and sqlite, and I’m certainly still on the learning curve to figure out how the planner works and how to live with it. Meanwhile, postgres feature set is so much better than mysql or sql server I’d never consider going back.

barrkel · on April 7, 2021

The feature set from an application perspective is killer. I love window functions especially; all sorts of clever things can be done in a single query which would otherwise require painful self-joins or multiple iterated queries and application-side joins in less sophisticated dialects.

nrdvana · on April 8, 2021

My favorite killer feature is jsonb_agg / jsonb_object_agg which let me pull trees of data in a single query without the exponential waste you’s get from cartesian products, and even deliver it to the frontend without needing to assemble the json myself.

barrkel · on April 7, 2021

The biggest problem I see is the planner not knowing the cardinality of columns in a big table after a join or predicate has been applied. You see this especially with aggregate queries rather than point queries.

That is, it decides that a sequential scan would be just peachy even though there's an inner join in the mix which in practice reduces the set of responsive rows, if it just constructed the join graph that way. The quickest route out of this is disabling sequential scan, but there's no hint to do that on a per-query basis. The longer route is hiding bits of the query in CTEs so the optimizer can't rewrite too much (CTEs which need MATERIALIZED nowadays since PG got smarter).

High total cardinality but low dependent cardinality - dependent on data in other tables, or with predicates applied to other tables - seems hard to capture without dynamic monitoring of query patterns and data access. I don't think PG does that; if it did, I think they'd sell it hard. It comes up with application-level constraints which relate to the data distribution across multiple tables.