I appreciate the work going on to make high quality poolers for Postgres...I jus...

x4m · on Nov 12, 2021

There's ongoing work on making Postgres connections more scalable. I really hope that one day we will not need a pooler. https://www.citusdata.com/blog/2020/10/08/analyzing-connecti...

gfody · on Nov 12, 2021

pooling at all is still pretty much a hack. ideally resource management (and tuning) wouldn't revolve around the connection and actual connections would become as thin as possible

anarazel · on Nov 12, 2021

There has been some work towards that - things did get cheaper in 14.

Note that you very well might still want a separately run pooler for some workloads - having local poolers on application servers can be good for latency.

saurik · on Nov 12, 2021

Yeah: the latency of establishing a new connection fundamentally requires at least one network round trip, and likely a few (and so I tend to have my servers going through two layers of pgbouncer, one on each side of the network); and using in-process pools couples your pool state to your front end code updates... so, since you are going to want to build this connection pooling mechanism anyway--and then are likely going to want to be able to take advtanage off application-specific capability limitations (such as not supporting connection state outside of a transaction) to further improve performance--it doesn't really make much sense to prioritize this inside the database server itself.

anarazel · on Nov 12, 2021

> it doesn't really make much sense to prioritize this inside the database server itself.

I wouldn't go that far. There's a lot of things that are hard to do within an external pooler, unless the pooler uses a 1:1 connection model. Leading to most setups using 1:1 connection pooling methods. Which in turn puts pressure on postgres to handle large numbers of connections gracefully.

whitepoplar · on Nov 13, 2021

Would an app-side ORM that has built-in connection pooling + PG14 (w/ connection scalability improvements) generally benefit from a connection pooler like Odyssey/PgBouncer, either server-side or app-side, when the number of app servers is low (<5-10)?

CoffeeOnWrite · on Nov 13, 2021

As a data point FWIW, I have operated a PG11 cluster (on RDS), client-side pooling only, with 300-450 app servers and max_connections=2000-3000 (can’t remember), and didn’t observe significant backend resource contention or performance degradation, just a memory cost linear to number of connections,

anarazel · on Nov 13, 2021

Unless the app servers are particularly large, or the app-side ORM connection pool doesn't work well (e.g. because the number of concurrent requests changes sufficiently over time that thread/process count constantly changes), I'd guess not. But without knowing a bit more it's hard to say.

whitepoplar · on Nov 13, 2021

Thanks!

remram · on Nov 13, 2021

Applications shouldn't be connecting once warm though.

Znafon · on Nov 12, 2021

It is being worked on: https://www.postgresql.org/message-id/flat/ac873432-31cf-d5e... !

funny_falcon · on Nov 13, 2021

But it is quite limited at the moment:

- connection is bound to backend process it were passed to, - backend process can perform only one transaction simultaneously, - therefore if backend process performs long transaction, all other connections on same backend process are waiting for.

Even pgbouncer in transaction mode performs better since it could switch incoming connections to different outgoing connections.

There will be no good builtin connection pooler until PostgreSQL adopts some kind of intra-process concurrency - threads or green-threads (preemptive coroutines). And I say, most likely it will never happen due to all extensions written for PostgreSQL.

riyadparvez · on Nov 12, 2021

Can someone point to an article that explains the connection issue in detail?

hn_throwaway_99 · on Nov 12, 2021

Short of it is that Postgres uses a process per connection, so architectures that spin up and close connections frequently can have serious scalability issues.

Note the landing page for the AWS RDS Proxy, https://aws.amazon.com/rds/proxy/ , is as good a discussion as any as to why you'd want to put a pooling proxy in front of Postgres.

staticassertion · on Nov 13, 2021

You're grey'd out, but based on some quick research this appears correct.

> PostgreSQL is implemented using a simple "process per user" client/server model. In this model there is one client process connected to exactly one server process. As we do not know ahead of time how many connections will be made, we have to use a master process that spawns a new server process every time a connection is requested. This master process is called postgres and listens at a specified TCP/IP port for incoming connections. Whenever a request for a connection is detected the postgres process spawns a new server process. The server tasks communicate with each other using semaphores and shared memory to ensure data integrity throughout concurrent data access.

https://www.postgresql.org/docs/8.2/connect-estab.html

bradknowles · on Nov 14, 2021

So, we tried AWS RDS Proxy for MySQL (well, actually AWS RDS Aurora MySQL), and we found that it did not improve our situation at all. It added latency, but no additional performance, and memory usage was unchanged.

The applications already had their own connection pooling functionality built-in, so AWS RDS Proxy didn’t buy us anything more beyond that.

I can’t speak for what this technology can do for situations where your application code does not already have connection pooling, or for cases regarding RDS Postgres.

ddorian43 · on Nov 12, 2021

https://www.citusdata.com/blog/2020/10/08/analyzing-connecti...