Here's a blog post on scalability from the front page (well, sort of, see last paragraph): https://www.process-one.net/blog/ejabberd-massive-scalabilit... This includes not just textual claims about scalability but a lot of hooks you can use to followup on, like a reference to the Tsung benchmarking tool you can use. If you're asking about this for serious reasons, at this scale you're obviously running your own tests anyhow. You may also want to speak directly to Process One about this because it sounds like you're at a scale where you should probably be looking at paid support anyhow.
I'm not necessarily endorsing this, just giving it to you as some first stabs at answers and some ways to follow up.
If there's anyone reading this from the ejabberd project, note that the link to this on your front page under "Massively Scalable" is completely broken; that links to "blog.process-one.net" but that domain is completely dead, so it doesn't redirect to the link I gave above. (It's also part of why I posted this; my post here is not a "just read these docs" because I had to do non-trivial work to find them.) I had to pull that out of archive.org. Should probably check for any other links to that domain too.
I'm past my edit window now so I can't remove the claim, but I will verify and agree that it is fixed now. If you (the reader, not Metalhearf) previously visited the home page you may have to manually reload to bust the cache to see the correct link(s) now.
If you want 40M connections, on FreeBSD, the maxfd count is physpages / 4; you could edit the kernel to change this, but are you really going to serve 40M connections with only 16k ram for each user? If my math is right that puts you around 640GB of ram. (40M * 4096 * 4 = 40G * 4 * 4 = 640 G), which is totally doable on a server socket. Probably you don't have everyone simultaneously connected, but probably you also need more ram per connection, so kind of depends. By the time you find 40M users to connect to your server, you should be in tune with your hardware needs ;)
If you’re running a single machine, you’ll be limited by the number of available ports. It’s a TCP limitation and nothing to do with XMPP. You’d need a cluster of XMPP servers to handle 40M users. Even for just text. Port limits are port limits.
You're thinking of outbound port limits. Inbound connections all come in on the same port and there's no port-related limits there.
The real limits are going to be on hardware. If we want 40M concurrent connections, the server will need a few hundred to a few thousand gigabytes of memory.
Unless you're doing multicast or anycast, there's a port bound IP handshake that happens. You have your listener (your server port) and the connected TCP/IP client (client port, on the server machine). You're limited to 64533 clients (0-65535 but the first 1000 are reserved). If there's a way to get more client on a single machine, I'm all ears - but that's baked into IP protocol. TCP/UDP doesn't matter. It's a limitation of IP.
Assume 10% of 40M users are active at once. That's 4M clients to deal with. You would need 62 servers (and probably a few for cache) to handle the load. But those could be small core cheap servers.
The uniqueness tuple for IP is (source IP, source port, dest IP, dest port). You're limited to 65,535 connections from the same two IPs on the same port, but that's not relevant to XMPP which uses only one and some transient things for file transfer and such. At worst having that many people behind one NAT will be a problem... which at this scale could be an actual problem, but there are still solutions (multiple ports being the easiest, and the fact this cluster will probably be on multiple public IPs anyhow).
You should be able to deal with 4 million clients on one server in 2025; we did 2 million on one chat server in 2012 [1]. I can find documentation of 2.8M in Rick Reed's Erlang Factory presentation [2], page 16. That was part of a clustered system with chat connections on one group of servers and several other groups of servers for account information, offline messages, etc. Also, the connections weren't Noise encrypted then, and there were a lot more clients in 2012 that didn't have platform push, so they would try to always be long connected ... it's not very hard to manage clients that just need to be pinged once every 10 minutes or so.
But this was with only 96G of ram, and dual westmere 6-core processors. You can put together a small desktop with consumer parts that will be way more capable today. And if you run everything on a single machine, you don't have to deal with distributed system stuff.
When I left in 2019, we were running on Xeon-D type hardware, with maybe 64 GB ram and IIRC, only doing about 300k connections per chat machine. Chat scales horizontally pretty well, so run a couple machines of whatever, find your bottlenecks and then scale up the machines until you hit the point where it costs more to get 2x of bottleneck on a single machine than to get 2 machines with 1x of the bottleneck. I suspect, if you can get quality AM5 servers, that's probably the way to go for chat; otherwise likely a single server socket would be best; dual socket probably doesn't make $/perf sense like it did 10 years ago. If you get fancy NICs, you might be able to offload TLS and save some CPU, but CPU TLS acceleration is pretty good and there's not a ram bandwidth saving from NIC TLS like there is for a CDN use case.
IMHO, getting a single server to support 4M clients shouldn't be too hard, and it should be a lot of fun. Mostly all the stuff that was hard in 2012 should be easy now between upstream software improvements and the massive difference between CPUs in 2012 and 2025. The hard part is getting 4M clients to want to connect to your server. And trying to setup a test environment. Elsewhere on this thread, there's a like to the process one blog from 2016 where they ran 2 M clients on a single server (m4.10xlarge (40 vCPU, 160 GiB) with a single same spec server as the client load generator; the impressive part there is the client load generator --- I've always needed to have something like a 10:1 ratio of load generators to servers to load down a big server. And they managed to get that to work with all their NIC interrupts sent to a single cpu (which is not what I would have recommended, but maybe EC2 didn't have multiple nic queues in 2016?)
> If there's a way to get more client on a single machine, I'm all ears - but that's baked into IP protocol. TCP/UDP doesn't matter. It's a limitation of IP.
As others have said, the limitation is the 5-tuple: {protocol, SrcIp, DstIp, SrcPort, DstPort}; if you're a server, if you hold SrcIP and SrcPort fixed, and for each dest ip, you can have 64k connections. There are a lot of dest ips, so you can host a lot of connections, much more than you can actually manage on a single system. If your clients are behind an aggressive CGNAT, you can actually run into problems where there's more than 64k clients that want to connect to your single IP and port ... but you can listen on multiple ports and address it that way, and most aggressive CGNATs are only for v4; if you listen on v6, you'll usually see the client's real ips or at least a much more diverse group of NAT addresses.
If you listen on all the ports, you can have 4 billion connections between you and any given IP. That's not going to be limiting for a while.