I'm interested in... why? What are you building that loading data from disk is so lopsided vs CPu load from compiling, or network load/latency(one 200ms of "is this the current git repo?" Is a heck of a lot of NVMe latency... and its going to be closer to 2s than 200ms)
I'm running the same setup - our larger builders have 2 32-core epycs with 2TB RAM. We were doing that type of setup already almost two decades ago in a different company, and in that one for over a decade now - back then that was the only option for speed.
Nowadays nvmes might indeed be able to get close - but we'd probably need to still span over multiple SSDs (reducing the cost savings), and the developers there are incredible sensitive to build times. If a 5 minute build suddenly takes 30 seconds more we have some unhappy developers.
Another reason is that it'd eat SSDs like candy. Current enterprise SSDs have something like a 10000 TBW rating, which we'd exceed in the first month. So we'd either get cheap consumer SSDs and replace them every few days, or enterprise SSDs and replace them every few months - or stick with the RAM setup, which over the live of the build system will be cheaper than constantly buying SSDs.
> Current enterprise SSDs have something like a 10000 TBW rating
Running the numbers to verify: a read-write-mixed enterprise SSD will typically have 3 DWPD (drive writes per day), across it's 5 year warranty. At 2TB, that would be 10950 TBW, so that sort of checks out. If endurance was a concern, upgrading to a higher capacity would linearly increase the endurance. For example the Kioxia CD8P-V. https://americas.kioxia.com/en-us/business/ssd/data-center-s...
Finding it a bit hard to imagine build machines working that hard, but I could believe it!
Same as the one earlier in the thread: Build servers, nicely loaded. A build generates a ridiculous amount of writes for stuff that just gets thrown out after the build.
We actually did try with SSDs about 15 years ago, and had a lot of dead SSDs in a very short time. After that we went for estimating data written, it's cheaper. While SSD durability increased a lot since then everything else got faster as well - so we'd have SSDs last a bit longer now (back then it was a weekly thing), but still nowhere near where it'd be a sensible thing to do.
You don't really want that. I'm keeping my sanity there just because my small company is running their CI and testing as contractor.
They indeed are quite spoiled - and that's not necessarily a good thing. Part of the issue is that our CI was good and fast enough that at some point a lot of the new hires never bothered to figure out how to build the code - so for quite a few the workflow is "commit to a branch, push it, wait for CI, repeat". And as they often just work on a single problem the "wait" is time lost for them, which leads to the unhappiness if we are too slow.
> I'm interested in... why? What are you building that loading data from disk is so lopsided vs CPu load from compiling (...)
This has been the basic pattern for ages, particularly with large C++ projects. C++ builds, specially with the introduction of multi-CPU and multi-core systems, turns builds into IO-bound workflows, specially during linking.
Creating RAM disks to speed up builds is one of the most basic and low effort strategies to improve build times, and I think it was the main driver for a few commercial RAM drive apps.
Historical, but also there was a bunch of physical ram drives - RAMsan, for example, sold DRAM-based (with battery backup) appliances connected by fiber channel - they were used for all kinds of tasks but often as very fast scratch space for databases. Some VAXen had a "RAM disk" card that was IIRC used as NFS cache on some unix variants. etc. etc.
Still odd. The OS should be able to manage the memory and balance performance more efficiently than that. There’s no reason to preallocate memory by hardware.
It was often used to supplement memory available in cheaper ways or otherwise more flexible. For example many hardware solutions allowed connecting more RAM than otherwise possible to be accessed by main bus, or at lower cost than the main memory (for example due to differences in interfaces required, adding battery backup, etc.)
RAMsan line for example started in 2000 with 64GB DRAM-based SSD with up to 15 1Gbit FC interfaces, providing a shared SAN SSD for multiple hosts (very well utilized by some of the beefier cluster SQL databases like Oracle RAC) but the company itself has been providing high speed specialized DRAM-based SSDs since 1978
The way it makes sense is when you can't add that much memory to the system directly, or when directly attached memory would be significantly more expensive. For this you can get away with much slower memory than you would attach to the memory bus directly - all you need is to be faster than the storage bus you are using.
Last time I saw one was with a mainframe, which kind of makes sense if adding cheaper third party memory to the machine would void warranties or breach support contracts. People really depend on company support for those machines.
Main cases I've seen with mainframes involved network-attached ram disks (actually, even earliest S/360 could share a disk device between two mainframes, so...)
A fast scratch pad that can be shared between multiple machines can be ideal at times.
Makes sense in batch environment - you can lock the volume, do your thing, and then freeing it to another task running on a different partition or host.
Still seems like a kludge - The One Right Way to do it would be to add that memory directly to a CPU addressable space rather than across a SCSI (or channel, or whatever) link. Might as well be added to the RAM in the storage server and let it manage the memory optimally (with hints from the host).
There was no locking (at least not necessarily), it was a shared resource that allowed programs on multiple computers to utilize together (also major use case for RAMsan where I worked with them - it was not about not being able to add memory, it was about common fast quorum and cache between multiple maxed out database servers)
> Still odd. The OS should be able to manage the memory and balance performance more efficiently than that. There’s no reason to preallocate memory by hardware.
You are arguing hypotheticals, whereas for decades the world had to deal with practicals. I recommend you spend a few minutes looking into how to create RAM drives on, say, Windows, and think through how to achieve that when your build workstation has 8GB of RAM and you need a scratchpad memory of, say, 16GB of RAM.
I know all that - I was there and I saw products like these in person (although they were in the megabyte range back then). I still remember a 5.25 hard-drive shaped box with a lead acid battery and lots of memory boards with 4164's (IIRC).
These are only for when the OS and the machine itself can't deal with the extra memory and wouldn't know what to do with it, things you buy when you run out of sensible options (such as adding more memory to your machine and/or configuring a RAM disk).
For the ROS ecosystem you’re often building dozens or hundreds of small CMake packages, and those configure steps are very io bound— it’s a ton of does this file exist, what’s in this file, compile this tiny test program, etc.
I assume the same would be true for any project that is configure-heavy.