One more thing: There's definitely a significant overhead implied by the process...

One more thing: There's definitely a significant overhead implied by the process-per-connection model - I don't want to deny that.

In my opinion it's at the moment not the most urgent issue wrt connection scalability (the snapshot scalability is independent from process v threads, and measurably the bottleneck), and the amount of work needed to change to a different model is larger.

But I do think we're gonna have to change to threads, in the not too far away future. We can work around all the individual problems, but the cost in complexity is bigger than the advantages of increased isolation. We had to add too much complexity / duplicated infrastructure to e.g. make parallelism work (which needs to map additional shared memory after fork, and thus addresses differ between processes).