Note that the data structure is different: AVL trees vs. BTrees. The interesting part for me was that because of modern cache hierarchies, BTrees appear to be superior than AVL trees even for in-memory data structures, while before I'd thought of them as a special-purpose on-disk data structure for databases.
AVL trees were more competitive when memory was more shallow. These days an L1 cache hit is dirt cheap and a RAM fetch is ~200x as expensive, following a pointer creates a data dependency. B-trees give you more L1 cache hits and fewer pointers to follow. Back in the 1990s the cache hits and RAM fetches were much closer to parity and there weren't as many pipeline stages to fill so data dependencies weren't as important. Go back even farther and you have completely uniform memory that could be accessed in a fixed number of cycles no matter what the address.
There are a lot of good books from these eras, and there are a lot of good books that teach with the uniform memory model. Most algorithms classes don't talk about the hierarchy at all (even though it distorts the true asymptotic runtime of many algorithms).
Agreed. It blew my mind to learn that tiling matrix multiplication is more efficient than untiled, but that you can only show tiled is faster in a hierarchical memory model (for tiled, you have arithmetic intensity that scales with size of your fast memory rather than a fixed constant). Yet in uniform memory model both are O(n^3).
Makes total sense. The insight is that the memory acts like the disk used to and the caches act like the memory. BTrees worked well for spinning disk because they'd nicely load a whole array (node) of BTree child pointers in one disk read. But the same works for caches, they'd load the the whole array of into the cache when the first one is accessed.
A silly idea I had, wonder if it would be possibly to auto-adjust the B factor of a BTree (the number of nodes) at runtime based the cache line size.
Auto adjusting would be interesting, even better (though harder) would be to do it experimentally rather than based on cache line size. Cache line size only determines the minimum block size that doesn't waste cache capacity, after all.
Ultimately the point was that Rust discourages intrusive data structures, and not using an intrusive data structure turns out to be a win even though experienced C programmers might believe otherwise. Now, of course, an apples to apples comparison would be nice, but I'm sure that after some experience with Rust one might end up writing similar code in C and the difference might be very small.
The point really is that Rust foists better design choices on the programmer.