Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Single data sets surpassed 2^64 bytes over a decade ago. This creates fun challenges since just the metadata structures can't fit in the RAM of the largest machines we build today.
 help



Virtualization has pushed back the need for a while, but we are going to have to look at pointers larger than 64 bit at some point. It's also not just about the raw size of datasets, but how we get a lot of utility out of various memory mapping tricks, so we consume more address space than the strict minimum required by the dataset. Also if we move up to 128 bit a lot more security mitigations become possible.

By virtualization are you referring to virtual memory? We haven't even been able to mmap() the direct-attached storage on some AWS instances for years due to limitations on virtual memory.

With larger virtual memory addresses there is still the issue that the ratio between storage and physical memory in large systems would be so high that cache replacement algorithms don't work for most applications. You can switch to cache admission for locality at scale (strictly better at the limit albeit much more difficult to implement) but that is effectively segmenting the data model into chunks that won't get close to overflowing 64-bit addressing. 128-bit addresses would be convenient but a lot of space is saved by keeping it 64-bit.

Space considerations aside, 128-bit addresses would open up a lot of pointer tagging possibilities e.g. the security features you allude to.


> By virtualization are you referring to virtual memory?

No, I mean k8s style architecture, where you take physical boxes and slice them into smaller partitions, hence the dataset on each partition is smaller than the raw hardware capability. That reduces the pressure towards the limit.


Ah yeah, that makes sense. With a good enough scheduler that starts to look a lot like a cache admission architecture.

I'd never thought of it that way, and it's an interesting perspective.

Please keep in mind that doubling isn't the only option. There's lots of numbers between 64 and 128.

Well, not so many if you assume some access alignment requirement for high-performance hardware designs...

Maybe. But from what I remember, many ostensibly 64 bit x86 computers were actually 48 bits in terms of memory addressing for a long time?

x86 is a funny example because it supported unaligned access more than many designs. But ignoring that...

Many CPUs, not just x86, have a "physical bits" length that is less than the address size in the ISA. This saves transistor and power budget, since address buses can be smaller. Of course, it means there is a lower maximum RAM config for that design.

The software would still shuffle around the full ISA word/double word/quad word or whatever. In a typical OS, the MMU and page mapping logic would potentially interpret all the bits to map to the more limited physical address range. It didn't mean storing smaller pointers in your software data structures etc.

I'm not an expert, but I think it varies by ISA whether it is defined how the higher address order bits, above the physical range, are handled. Some may allow applications to set them for address tagging, while the CPU ignores those bits. But, others may require them to all be zeroed to get predictable behavior.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: