I've been really curious precisely what changed, and what sort of optimization might have been involved here.
Because offhand, I know you could do things like cute optimizations of redundant data to minimize seek time on optical media, but with HDDs, you get no promises about layout to optimize around...
The only thing I can think of is if it was literally something as inane as checking the "store deduplicated by hash" option in the build, on a tree with copies of assets scattered everywhere, and it was just nobody had ever checked if the fear around the option was based on outcomes.
(I know they said in the original blog post that it was based around fears of client performance impact, but the whole reason I'm staring at that is that if it's just a deduplication table at storage time, the client shouldn't...care? It's not writing to the game data archives, it's just looking stuff up either way...)
I'm not entirely clear what you're trying to say, but, my understanding is that they simply put lots of copies of files in lots of places like games have done for a long time, in the hopes it would lower seek times on HDDs for those players who use them.
They realised, after a lot of players asking, that it wasn't necessary, and probably had less of an impact than they thought.
They removed the duplicates, and drastically cut the install size. I updated last night, and the update alone was larger than the entire game after this deduplication run, so I'll be opting in to the Beta ASAP.
It's been almost a decade since I ran spinning rust in a desktop, and while I admire their efforts to support shitty hardware, who's playing this on a machine good enough to play but can't afford £60 for a basic SSD for their game storage?
HDDs also have a spinning medium and a read head , so the optimization is similar to optical media like CDs.
Let’s say you have UI textures that you always need, common player models and textures, the battle music, but world geometry and monsters change per stage.
Create an archive file (pak, wad, …) for each stage, duplicating UI, player and battle music assets into each archive. This makes it so that you fully utilize HDD pages (some small config file won’t fill 4kb filesystem pages or even the smaller disk sectors). All the data of one stage will be read into disk cache in one fell swoop as well.
On optical media like CDs one would even put some data closer to the middle or on the outer edge of the disc because the reading speed is different due to the linear velocity.
This is an optimization for bandwidth at the cost of size (which often wasn’t a problem because the medium wasn’t filled anyway)
> HDDs also have a spinning medium and a read head , so the optimization is similar to optical media like CDs.
HDDs also have to real with fragmentation, I wonder what the odds that you get to write 150 GBs (and then regular updates in the 30GB range) without breaking it into fragments...
Microsoft has a paper somewhere that shows IO speed starts to drop when fragments of files get below 64MB. So you can split that file up into a few thousand pieces without much performance loss at all.
The game installer can't control the layout on an HDD without doing some very questionable things like defragging and moving existing user files around the disk. It probably _could_ but the risk of irrecoverable user data loss or accidentally corrupting a boot partition via a bug would make it completely not worth it.
Even if you pack those, there's no guarantee they don't get fragmented by the filesystem.
CDs are different not because of media, but because of who owns the storage media layout.
It's less about ensuring perfect layout as it is about avoiding almost guaranteed terrible layout. Unless your filesystem is fully fragmented already it won't intentionally shuffle and split big files without a good reason.
Single large file is still more likely to be mostly sequential compared to 10000 tiny files. With large amount of individual files the file system is more likely to opportunistically use the small files for filling previously left holes. Individual files more or less guarantee that you will have to do multiple syscalls per each file and to open and read it, also potentially more amount of indirection and jumping around on the OS side to read the metadata of each individual file. Individual files also increases chance of accidentally introducing random seeks due to mismatch between the order updater writes files, the way file system orders things and the order in which level description files list and reads files.
I don't disagree that large files are better, or at least simpler. I do think gaming drives are more likely than most to be fragmented (large file sizes, frequent updates plus uninstalling and re-installing). A single large file should make the next read predictable and easy to buffer.
I am a little curious about the performance of reading several small files concurrently versus reading a large file linearly. I could see small files performing better with concurrent reads if they can be spread around the disk and the IO scheduler is clever enough that the disk is reading through nearly the whole rotation. If the disk is fragmented, the small files should theoretically be striped over basically the entire disk.
Because offhand, I know you could do things like cute optimizations of redundant data to minimize seek time on optical media, but with HDDs, you get no promises about layout to optimize around...
The only thing I can think of is if it was literally something as inane as checking the "store deduplicated by hash" option in the build, on a tree with copies of assets scattered everywhere, and it was just nobody had ever checked if the fear around the option was based on outcomes.
(I know they said in the original blog post that it was based around fears of client performance impact, but the whole reason I'm staring at that is that if it's just a deduplication table at storage time, the client shouldn't...care? It's not writing to the game data archives, it's just looking stuff up either way...)