Criticism is valid, but he talks about cases of millions connections to a single...

scurvy · on April 4, 2020

I'll trust postgresql more when it can support a few thousand connections without resorting to running middleware (pgbouncer) all over. That was his point. PostgreSQL is just abysmally bad in this area.

The process per connection model works great for "my first rails project" so every developer brings it to $dayjob. Then they are caught off guard when they start getting real traffic. It's terrifying to watch a couple hundred connections take a moderately sized server (~100 threads) down the native_queued_spin_lock_slowpath path to ruin. That's just sad.

anarazel · on April 5, 2020

> I'll trust postgresql more when it can support a few thousand connections without resorting to running middleware (pgbouncer) all over. That was his point. PostgreSQL is just abysmally bad in this area.

Depending on your workload it's entirely possible to run PG with 2000 connections. The most important thing is to configure postgres / the operating system to use huge pages, that gets rid of a good bit of the overhead.

If the workload has a lot of quick queries it's pretty easy to hit scalability issues around snapshots (the metadata needed to make visibility determinations). It's not that bad on a single-socket server, but on 2+ sockets with high core counts it can be significant.

We're working on it (I'm polishing the patch right now, actually :)). Here's an example graph https://twitter.com/AndresFreundTec/status/12346215343642419...

My local 2 socket workstation doesn't have enough cores to show the problem to the same degree unfortunately, so the above is from an azure VM. The odd dip in the middle is an issue with slow IPIs on azure VMs, and is worse when the benchmark client and server run on the same machine.

> It's terrifying to watch a couple hundred connections take a moderately sized server (~100 threads) down the native_queued_spin_lock_slowpath path to ruin. That's just sad.

Which spinlock was that on? I've seen a number of different ones over time. I've definitely hit ones in various drivers, and in both the generic parts of the unix socket and tcp stacks.

anarazel · on April 5, 2020

One more thing: There's definitely a significant overhead implied by the process-per-connection model - I don't want to deny that.

In my opinion it's at the moment not the most urgent issue wrt connection scalability (the snapshot scalability is independent from process v threads, and measurably the bottleneck), and the amount of work needed to change to a different model is larger.

But I do think we're gonna have to change to threads, in the not too far away future. We can work around all the individual problems, but the cost in complexity is bigger than the advantages of increased isolation. We had to add too much complexity / duplicated infrastructure to e.g. make parallelism work (which needs to map additional shared memory after fork, and thus addresses differ between processes).

scurvy · on April 5, 2020

>> native_queued_spin_lock_slowpath path >Which spinlock was that on? I've seen a number of different ones over time. I've definitely hit ones in various drivers, and in both the generic parts of the unix socket and tcp stacks.

Not sure yet. It was on a server with 1000 stable connections. Things were fine for a while, then suddenly system would jump to 99% on all 104 threads and native_queued_spin_lock_slowpath was indicated by perf.

Ironically we cleared it up by having sessions disconnect when they were done. Boggled the mind that increasing connection churn improved things, but it did.

anarazel · on April 7, 2020

That sounds like you could have hit transparent hugepage / compaction related issues. They, IME, tend to hit more often with lots of long running processes, than when there's a lot of churn. It has gotten a lot better in more recent kernel versions, but if you're on an older kernel, it can be really bad.

scurvy · on April 9, 2020

THP => never. I thought about that, too.

minusf · on April 5, 2020

pgbouncer is also a speed optimisation, not just scaling, all agreed, pgsql lags in this currently.

but the default 150ish connections ouf of the box mean 150 workers which means 20ish 8 core VMs for your e.g. django app (1 worker/core), which is a lot of scaling already and a good business problem to have, not just an app demo. Most internal projects never make it even there.