Funny that I thought the biggest improvement of PS5 is actually crazy fast storage. No loading screen is really gamechanger. I would love to get xbox instant resume on Playstation.
The hardware 3D audio acceleration (basically fancy HRTFs) is also really cool, but almost no 3rd party games use it.
I've had issues with Xbox instant resume. Lots of "your save file has changed since the last time you played, so we have to close the game and relaunch" issues. Even when the game was suspended an hour earlier. I assume it's just cloud save time sync issues where the cloud save looks newer because it has a timestamp 2 seconds after the local one. Doesn't fill me with confidence, though.
Pretty sure they licensed a compression codec from RAD and implemented it in hardware, which is why storage is so fast on the PS5. Sounds like they're doing the same thing for GPU transfers now.
Storage on the PS5 isn't really fast. It's just not stupidly slow. At the time of release, the raw SSD speeds for the PS5 were comparable to the high-end consumer SSDs of the time, which Sony achieved by using a controller with more channels than usual so that they didn't have to source the latest NAND flash memory (and so that they could ship with only 0.75 TB capacity). The hardware compression support merely compensates for the PS5 having much less CPU power than a typical gaming desktop PC. For its price, the PS5 has better storage performance than you'd expect from a similarly-priced PC, but it's not particularly innovative and even gaming laptops have surpassed it.
The most important impact by far of the PS5 adopting this storage architecture (and the Xbox Series X doing something similar) is that it gave game developers permission to make games that require SSD performance.
So, you're saying they built a novel storage architecture that competed with state-of-the-art consumer hardware, at a lower price point. Five years later, laptops are just catching up, and that at the same price point, it's faster than what you'd expect from a PC.
The compression codec they licensed was built by some of the best programmers alive [0], and was later acquired by Epic [1]
I dunno how you put those together and come up with "isn't really fast" or "not particularly innovative".
Fast doesn't mean 'faster than anything else in existence'. Fast is relative to other existing solutions with similar resource constraints.
Their storage architecture was novel in that they made different tradeoffs than off the shelf SSDs for consumer PCs, but there's absolutely no innovation aspect to copy and pasting four more NAND PHYs that are each individually running at outdated speeds for the time. Sony simply made a short-term decision to build a slightly more expensive SSD controller to enable significant cost savings on the NAND flash itself. That stopped mattering within a year of the PS5 launching, because off the shelf 8-channel drives with higher speeds were no longer in short supply.
"Five years later, laptops are just catching up" is a flat out lie.
"at the same price point, it's faster than what you'd expect from a PC" sounds impressive until you remember that the entire business model of Sony and Microsoft consoles is to sell the console at or below cost and make the real money on games, subscription services, and accessories.
The only interesting or at all innovative part of this story is the hardware decompression stuff (that's in the SoC rather than the SSD controller), but you're overselling it. Microsoft did pretty much the same thing with their console and a different compression codec. (Also, the fact that Kraken is a very good compression method for running on CPUs absolutely does not imply that it's the best choice for implementing in silicon. Sony's decision to implement it in hardware was likely mainly due to the fact that lots of PS4 games used it.) Your own source says that space savings for PS5 games were more due to the deduplication enabled by not having seek latency to worry about, than due to the Kraken compression.
I’m not quite sure why you’re so negative about the storage architecture, the idea of having the storage controller unpack and write into memory itself with no cpu in the loop (after the initial request) is a really cool idea, and clearly pays off.
To my knowledge we haven’t really got a version of that in PC land. The closest is an nvidia-specific thing that lets the GPU read storage directly, but that’s still not the same.
> Android apps is rarely a thing that you download once and use for multiple years.
I've been using FOSS apps, Amaze File Manager, Muzei (wallpaper manager), and Lawnchair (Launcher), for well over 8 years across multiple Android devices and versions, with nary an issue.
The situation with Nova (and SimpleMobileTools before it) is that developers are selling their popular projects. This isn't an "Android" thing, but more of an indictment of sustainability of indie FOSS projects. This isn't limited to consumer apps, though (see: Redis).
> The situation with Nova (and SimpleMobileTools before it) is that developers are selling their popular projects. This isn't an "Android" thing, but more of an indictment of sustainability of indie FOSS projects.
Exactly this. I bought Nova 10 years ago (December 2015), and I even paid only 10 cent (Google had some absurd sales on apps back then). Till today, I'm using it, receiving updates, and never paid another cent to the dev AFAIK. I don't know how they even financed their business, but even as popular as it was (is?), I doubt they are swimming in money.
I can understand why so many apps end up in subscription-traps, or are selling the whole business to someone else. But at the same time it's insane that as a customer it's really hard to find a useful app, and have a way to continue support on a low level. For example, I probably wouldn't have a problem with paying Nova some dollar every time I switched to a new device, to get a new compatible version. But I can't even do that. There is barely any mobile app iterating this way through the android-versions. So they all are either dying at some point, or end in a trap.
I have an old launcher, and even an old version of it. Version 2 was great, version 3 came with small cosmetic changes I didn't like, plus fucking ads. So I restored v2 from backups and stopped allowing auto-updates, to stay on v2.
One day the makers disappeared from the Play Store, voila, no more auto-enshittification, I win!
I've barely touched Go in over a decade, but if I did, I'd probably still use ccache if I didn't need cutting edge (because I think the API is simple), but not if I needed something at huge scale.
When I wrote ccache, there were two specific features that we wanted that weren't readily available:
- Javing both a key and a subkey, so that you can delete either by key or key+subkey (what ccache calls LayeredCache).
- Having items cached that other parts of the system also have a long-living reference to, so there's not much point in evicting them (what ccache calls Tracking and is just a separate ARC mechanism that overrides the eviction logic).
It also supports caching based on arbitrary item size (rather than just a count of items), but I don't remember if that was common back then.
I've always thought that this, and a few other smaller features, make it a little bloated. Each cached item carries a lot of information (1). I'm surprised that, in the linked benchmark, the memory usage isn't embarrassing.
I'm not sure that having a singl goroutine do a lot of the heavy-lifting, to minimize locks, is a great idea. It has a lot of drawbacks, and if I was to start over again, I'd really want to benchmark it to see if it's worth it (I suspect that, under heavy write loads, it might perform worse).
The one feature that I do like, that I think most LRU's should implement, is to have a [configurable] # of gets before an item is promoted. This not only reduces the need for locking, it also adds some frequency bias to evictions.
Fun Fact: My goto interview question was to implement a cache. It was always rewarding to see people make the leap from using a single data structure (a dictionary) to using two (dictionary + linked list) to achieve a goal. It's not a way most of us are trained to think of data structures, which I think is a shame.
Putting aside performance metrics (latency, throughput, hit rate, memory usage), here's what I don't like:
1. I don't really see how the API is simpler. ccache has tons of methods like `GetsPerPromote`, `PercentToPrune`, `Buckets`, `PromoteBuffer`, `DeleteBuffer`. How is a user supposed to know what values to set here? Honestly, even with all the time I've spent digging through cache implementations, I don't fully understand what should be configured there. Otter simply doesn't need any of these - you just specify the maximum size and the cache works.
2. Numerous methods like `tracking` and `promote` are again unnecessary for otter. Just `getIfPresent` and `set`/`setIfAbsent` and you're good to go.
3. The lack of loading and refreshing features seems like a significant drawback, as they typically provide major benefits for slow data sources.
I don't disagree. It's like 13 years old. `GetWithoutPromote` was added in 2022, I assume someone asked for it, so I added it. That kind of stuff happens, especially when you stop building it for your own needs.
For the most part, you use a default config and use Get/Fetch/Set. Besides the excuse of its age, and not being seriously worked on for a long time (a decade?), I do think we both have a bias towards what's more familiar. What are the `ExpiryCalculator`, `Weigher`, etc... configuration options of Otter? (or `GetEntryQuietly`, `SetRefreshableAfter` ...)
I believe `ExpiryCalculator` is fairly self-explanatory. For example, `ExpiryWriting` returns an `ExpiryCalculator` that specifies the entry should be automatically evicted from the cache after the given duration from either its creation or value update. The expiration time isn't refreshed on reads.
`Weigher` is also likely clear from its doc. Many developers are at least familiar with this concept from other languages or libraries like ristretto and ttlcache.
`GetEntryQuietly` retrieves the cache entry for a key without any side effects - it doesn't update statistics or influence eviction policies (unlike `GetEntry`). I genuinely think this is reasonably clear.
I'm completely baffled why `SetRefreshableAfter` made this list. If you understand refreshing, it's obviously just `SetTTL` but for the refresh policy.
Honestly, I mostly disagree about the options being unclear. I suspect `Executor` is the only one that might confuse users after reading the docs, and it's mainly for testing anyway. My core complaint is the first point in my comment - tuning the cache requires deep understanding of its internals. Take ristretto's `NumCounters` parameter: users don't understand it and often just set it to `maxCost * 10` like the README example. But this completely breaks when using custom per-entry costs (like byte sizes).
But as I mentioned when reviewing sturdyc, it probably comes down to personal preference.
I benchmarked ccache for throughput [1], memory consumption [2], and hit rate [3]. For hit rate simulations, I used golang-lru's LRU implementation, though I doubt a correct LRU implementation would show meaningful hit rate differences.
Note that otter's simulator results were repeatedly compared against both W-TinyLFU's (Caffeine) and S3-FIFO's (Libcachesim) simulators, showing nearly identical results with differences within hundredths of a percent.
The vibe of AI discussions was so much different with today's (or at least for the past 2.5 years at least).
It's quite surreal how things moved so fast
I would say compute and storage separation is the way to go, especially for hyperscaler offering ala aurora db/cosmos/alloy. And later more opensource alternatives will catch up.
Most analytics workloads are bandwidth-bound if you are optimizing them at all. The major issue with disaggregated storage is that the storage bandwidth is terrible in the cloud. I can buy a server from Dell with 10x the usable storage bandwidth of the fastest environments in AWS and that will be reflected in workload performance. The lack of usable bandwidth even on huge instance types means most of that compute and memory is not doing much — you are forced to buy compute you don’t need to access mediocre bandwidth of which there is never enough. The economics are poor as a result.
This is an architectural decision of the cloud providers to some extent. Linux can drive well over 1 Tbps of direct-attached storage bandwidth on a modern server but that bandwidth is largely beyond the limits of cheap off-the-shelf networking that disaggregated storage is often running over.
Object storage does scale out to that performance (via replication) but you do need to use multiple compute instances as you only get say 100Gb on each which is low. You can also do some of the filtering in the api which helps too.