Everything I've read indicates that RAM caches work poorly in a GC environment.
The problem is that garbage collectors are optimized for applications that mostly have short-lived objects, and a small amount of long-lived objects.
Things like large in-RAM LRU are basically the slowest thing for a garbage collector to do, because the mark-and-sweep phase always has to go through the entire cache, and because you're constantly generating garbage that needs to be cleaned.
> The problem is that garbage collectors are optimized for applications that mostly have short-lived objects, and a small amount of long-lived objects.
I think it's not quite that.
Applications typically have a much larger old generation than young generation, i.e. many more long lived objects than short lived objects. So GCs do get optimized to process large heaps of old objects quickly and efficiently, e.g. with concurrent mark/sweep.
However as an additional optimization, there is the observation that once an application has reached steady state, most newly allocated objects die young (think: the data associated with processing a single HTTP request or user interaction in a UI).
So as an additional optimization, GCs often split their heap into a young and an old generation, where garbage collecting the young generation earlier/more frequently overall reduces the mount of garbage collection done (and offsets the effort required to move objects around).
In the case of Go though, the programming language allows "internal pointers", i.e. pointers to members of objects. This makes it much harder (or much more costly) to implement a generational, moving garbage collector, so Go does not actually have a young/old generation split nor the additional optimization for young objects.
Which is why on GC languages that also support value types and off GC-heap allocations, one makes use of them, instead of throwing out the baby with the water.
A high number of short lived allocations is also a bad thing in a compacting GC environment, because every allocation gets you a reference to a memory region touched very long time ago and it is likely a cache miss. You would like to do an object pool to avoid this but then you run into a pitfall with long living objects, so there is really no good way out.
The allocation is going to be close to the last allocation, which was touched recently, no? The first allocation after a compaction wii be far from recent allocations, but close to the compacted objects?
Close to the last allocation doesn't matter. What matters is the memory returned to the application - and this is memory that has been touched long ago and unlikely in cache. If your new generation size is larger than L3 cache it will have to be fetched from main memory for sure every time you start the next 64 bytes. I believe a smart cpu will notice the pattern and will prefetch to reduce cache miss latency. But a high allocation rate will use a lot of memory bandwidth and would thrash the caches.
An extreme case of that problem happens when using GC in an app that gets swapped out. Performance drops to virtually zero then.
The problem is that garbage collectors are optimized for applications that mostly have short-lived objects, and a small amount of long-lived objects.
Things like large in-RAM LRU are basically the slowest thing for a garbage collector to do, because the mark-and-sweep phase always has to go through the entire cache, and because you're constantly generating garbage that needs to be cleaned.