VMs with JIT like the JVM are only ever really fast/competitive with C in small ...

thu2111 · on Feb 5, 2020

This stuff is really hard to pin down though. I've been reading these sorts of debates forever.

It's true that pointer chasing really hurts in some sorts of program and benchmark. For sure. No argument. That's why Project Valhalla exists.

But it's also my view that modern C++ programming gets away with a lot of slow behaviours that people don't really investigate or talk about because they're smeared over the program and thus don't show up in profilers, whereas actually the JVM fixes them everywhere.

C++ programs tend to rely much more heavily on copying large structures around than pointer-heavy programs. This isn't always or even mostly because "value types are fast". It's usually because C++ doesn't have good memory management so resource management and memory layout gets conflated, e.g. std::vector<BigObject>. You can't measure this because the overheads are spread out over the entire program and inlined everywhere, so don't really show up in profiling. For the same reasons C++ programs rely heavily on over-specialised generics where the specialisation isn't actually a perf win but rather a side effect of the desire for automatic resource management, which leads to notorious problems with code bloat and (especially) compile time bloat.

Another source of normally obscured C++ performance issues is the heap. We know malloc is very slow because people so frequently roll their own allocators that the STL supports this behaviour out of the box. But malloc/new is also completely endemic all over C++ codebases. Custom allocators are rare and restricted to very hot paths in very well optimised programs. On the JVM allocation is always so fast it's nearly free, and if you're not actually saturating every core on the machine 100% of the time, allocation effectively is free because all the work is pushed to the spare cores doing GC.

Yet another source of problems is cases where the C++ programmer doesn't or can't actually ensure all data is laid out in memory together because the needed layouts are dynamically changing. In this case a moving GC like in the JVM can yield big cache hit rate wins because the GC will move objects that refer to each other together, even if they were allocated far apart in time. This effect is measurable in modern JVMs where the GC can be disabled:

https://shipilev.net/jvm/anatomy-quarks/11-moving-gc-localit...

And finally some styles of C++ program involve a lot of virtual methods that aren't always used, because e.g. there is a base class that has multiple implementations but in any given run of the program only one base class is used (unit tests vs prod, selected by command line flag etc). JVM can devirtualise these calls and make them free, but C++ compilers usually don't.

On the other hand all these things can be obscured by the fact that C++ these days tends only to be used in codebases where performance is considered important, so C++ devs write performance tuned code by default (or what they think is tuned at least). Whereas higher level languages get used for every kind of program, including the common kind where performance isn't that big of a deal.

barrkel · on Feb 5, 2020

The costs of languages like C++ also get worse the older a program is.

Without global knowledge of memory lifetimes, maintainers make local decisions to copy rather than share.

scott_s · on Feb 5, 2020

> We know malloc is very slow because people so frequently roll their own allocators that the STL supports this behaviour out of the box. But malloc/new is also completely endemic all over C++ codebases. Custom allocators are rare and restricted to very hot paths in very well optimised programs. On the JVM allocation is always so fast it's nearly free, and if you're not actually saturating every core on the machine 100% of the time, allocation effectively is free because all the work is pushed to the spare cores doing GC.

Allocation in a C++ program is going to be about the same speed as in a Java program. Modern mallocs are doing basically the same thing on the hot-path: bumping the index on a local slab allocator.