I had these two dual-18-core xeon web servers with seemingly identical hardware and software setup but one was doing 1100 req/s and the other 500-600.
After some digging, I've realized that one had 8x8GB ram modules and the slower one had 2x32GB.
I did some benchmarking then and found that it really depends on the workload. The www app was 50% slower. Memcache 400% slower. Blender 5% slower. File compression 20%. Most single-threaded tasks no difference.
The takeaway was that workloads want some bandwidth per core, and shoving more cores into servers doesn't increase performance once you hit memory bandwidth limits.
After some digging, I've realized that one had 8x8GB ram modules and the slower one had 2x32GB.
I did some benchmarking then and found that it really depends on the workload. The www app was 50% slower. Memcache 400% slower. Blender 5% slower. File compression 20%. Most single-threaded tasks no difference.
The takeaway was that workloads want some bandwidth per core, and shoving more cores into servers doesn't increase performance once you hit memory bandwidth limits.