You're thinking of outbound port limits. Inbound connections all come in on the ...

reactordev · 2025-10-06T14:18:55 1759760335

Unless you're doing multicast or anycast, there's a port bound IP handshake that happens. You have your listener (your server port) and the connected TCP/IP client (client port, on the server machine). You're limited to 64533 clients (0-65535 but the first 1000 are reserved). If there's a way to get more client on a single machine, I'm all ears - but that's baked into IP protocol. TCP/UDP doesn't matter. It's a limitation of IP.

Assume 10% of 40M users are active at once. That's 4M clients to deal with. You would need 62 servers (and probably a few for cache) to handle the load. But those could be small core cheap servers.

jerf · 2025-10-06T14:44:05 1759761845

The uniqueness tuple for IP is (source IP, source port, dest IP, dest port). You're limited to 65,535 connections from the same two IPs on the same port, but that's not relevant to XMPP which uses only one and some transient things for file transfer and such. At worst having that many people behind one NAT will be a problem... which at this scale could be an actual problem, but there are still solutions (multiple ports being the easiest, and the fact this cluster will probably be on multiple public IPs anyhow).

See https://ats1999.github.io/blog/65535/ or similar pages.

toast0 · 2025-10-06T17:45:17 1759772717

You should be able to deal with 4 million clients on one server in 2025; we did 2 million on one chat server in 2012 [1]. I can find documentation of 2.8M in Rick Reed's Erlang Factory presentation [2], page 16. That was part of a clustered system with chat connections on one group of servers and several other groups of servers for account information, offline messages, etc. Also, the connections weren't Noise encrypted then, and there were a lot more clients in 2012 that didn't have platform push, so they would try to always be long connected ... it's not very hard to manage clients that just need to be pinged once every 10 minutes or so.

But this was with only 96G of ram, and dual westmere 6-core processors. You can put together a small desktop with consumer parts that will be way more capable today. And if you run everything on a single machine, you don't have to deal with distributed system stuff.

When I left in 2019, we were running on Xeon-D type hardware, with maybe 64 GB ram and IIRC, only doing about 300k connections per chat machine. Chat scales horizontally pretty well, so run a couple machines of whatever, find your bottlenecks and then scale up the machines until you hit the point where it costs more to get 2x of bottleneck on a single machine than to get 2 machines with 1x of the bottleneck. I suspect, if you can get quality AM5 servers, that's probably the way to go for chat; otherwise likely a single server socket would be best; dual socket probably doesn't make $/perf sense like it did 10 years ago. If you get fancy NICs, you might be able to offload TLS and save some CPU, but CPU TLS acceleration is pretty good and there's not a ram bandwidth saving from NIC TLS like there is for a CDN use case.

IMHO, getting a single server to support 4M clients shouldn't be too hard, and it should be a lot of fun. Mostly all the stuff that was hard in 2012 should be easy now between upstream software improvements and the massive difference between CPUs in 2012 and 2025. The hard part is getting 4M clients to want to connect to your server. And trying to setup a test environment. Elsewhere on this thread, there's a like to the process one blog from 2016 where they ran 2 M clients on a single server (m4.10xlarge (40 vCPU, 160 GiB) with a single same spec server as the client load generator; the impressive part there is the client load generator --- I've always needed to have something like a 10:1 ratio of load generators to servers to load down a big server. And they managed to get that to work with all their NIC interrupts sent to a single cpu (which is not what I would have recommended, but maybe EC2 didn't have multiple nic queues in 2016?)

> If there's a way to get more client on a single machine, I'm all ears - but that's baked into IP protocol. TCP/UDP doesn't matter. It's a limitation of IP.

As others have said, the limitation is the 5-tuple: {protocol, SrcIp, DstIp, SrcPort, DstPort}; if you're a server, if you hold SrcIP and SrcPort fixed, and for each dest ip, you can have 64k connections. There are a lot of dest ips, so you can host a lot of connections, much more than you can actually manage on a single system. If your clients are behind an aggressive CGNAT, you can actually run into problems where there's more than 64k clients that want to connect to your single IP and port ... but you can listen on multiple ports and address it that way, and most aggressive CGNATs are only for v4; if you listen on v6, you'll usually see the client's real ips or at least a much more diverse group of NAT addresses.

If you listen on all the ports, you can have 4 billion connections between you and any given IP. That's not going to be limiting for a while.

[1] https://blog.whatsapp.com/1-million-is-so-2011

[2] https://www.erlang-factory.com/upload/presentations/558/efsf...