I had these two dual-18-core xeon web servers with seemingly identical hardware ...

I had these two dual-18-core xeon web servers with seemingly identical hardware and software setup but one was doing 1100 req/s and the other 500-600.

After some digging, I've realized that one had 8x8GB ram modules and the slower one had 2x32GB.

I did some benchmarking then and found that it really depends on the workload. The www app was 50% slower. Memcache 400% slower. Blender 5% slower. File compression 20%. Most single-threaded tasks no difference.

The takeaway was that workloads want some bandwidth per core, and shoving more cores into servers doesn't increase performance once you hit memory bandwidth limits.