I appreciate the work going on to make high quality poolers for Postgres...I just really wish the work was done in-core so we could have built in connection pooling, or the work done to make grabbing a new connection as cheap as if you were using a pooler. It sucks to have to add complexity to your stack to fix a "deficiency" in Postgres' design.
Still, I am glad there is effort put into project even if I selfishly wish it was done in-core.
pooling at all is still pretty much a hack. ideally resource management (and tuning) wouldn't revolve around the connection and actual connections would become as thin as possible
There has been some work towards that - things did get cheaper in 14.
Note that you very well might still want a separately run pooler for some workloads - having local poolers on application servers can be good for latency.
Yeah: the latency of establishing a new connection fundamentally requires at least one network round trip, and likely a few (and so I tend to have my servers going through two layers of pgbouncer, one on each side of the network); and using in-process pools couples your pool state to your front end code updates... so, since you are going to want to build this connection pooling mechanism anyway--and then are likely going to want to be able to take advtanage off application-specific capability limitations (such as not supporting connection state outside of a transaction) to further improve performance--it doesn't really make much sense to prioritize this inside the database server itself.
> it doesn't really make much sense to prioritize this inside the database server itself.
I wouldn't go that far. There's a lot of things that are hard to do within an external pooler, unless the pooler uses a 1:1 connection model. Leading to most setups using 1:1 connection pooling methods. Which in turn puts pressure on postgres to handle large numbers of connections gracefully.
Would an app-side ORM that has built-in connection pooling + PG14 (w/ connection scalability improvements) generally benefit from a connection pooler like Odyssey/PgBouncer, either server-side or app-side, when the number of app servers is low (<5-10)?
As a data point FWIW, I have operated a PG11 cluster (on RDS), client-side pooling only, with 300-450 app servers and max_connections=2000-3000 (can’t remember), and didn’t observe significant backend resource contention or performance degradation, just a memory cost linear to number of connections,
Unless the app servers are particularly large, or the app-side ORM connection pool doesn't work well (e.g. because the number of concurrent requests changes sufficiently over time that thread/process count constantly changes), I'd guess not. But without knowing a bit more it's hard to say.
- connection is bound to backend process it were passed to,
- backend process can perform only one transaction simultaneously,
- therefore if backend process performs long transaction, all other connections on same backend process are waiting for.
Even pgbouncer in transaction mode performs better since it could switch incoming connections to different outgoing connections.
There will be no good builtin connection pooler until PostgreSQL adopts some kind of intra-process concurrency - threads or green-threads (preemptive coroutines). And I say, most likely it will never happen due to all extensions written for PostgreSQL.
Short of it is that Postgres uses a process per connection, so architectures that spin up and close connections frequently can have serious scalability issues.
Note the landing page for the AWS RDS Proxy, https://aws.amazon.com/rds/proxy/ , is as good a discussion as any as to why you'd want to put a pooling proxy in front of Postgres.
You're grey'd out, but based on some quick research this appears correct.
> PostgreSQL is implemented using a simple "process per user" client/server model. In this model there is one client process connected to exactly one server process. As we do not know ahead of time how many connections will be made, we have to use a master process that spawns a new server process every time a connection is requested. This master process is called postgres and listens at a specified TCP/IP port for incoming connections. Whenever a request for a connection is detected the postgres process spawns a new server process. The server tasks communicate with each other using semaphores and shared memory to ensure data integrity throughout concurrent data access.
So, we tried AWS RDS Proxy for MySQL (well, actually AWS RDS Aurora MySQL), and we found that it did not improve our situation at all. It added latency, but no additional performance, and memory usage was unchanged.
The applications already had their own connection pooling functionality built-in, so AWS RDS Proxy didn’t buy us anything more beyond that.
I can’t speak for what this technology can do for situations where your application code does not already have connection pooling, or for cases regarding RDS Postgres.
Still, I am glad there is effort put into project even if I selfishly wish it was done in-core.