What you describe is *best practice* for older versions of Cassandra with older ...

WookieRushing · on Oct 9, 2022

What psaux mentioned makes more sense. A node == one Cassandra agent instead of a server.

Past 100k servers you start needing really intense automation just to keep the fleet up with enough spares.

If you’ve got say 10k servers it’s much more manageable

The fun thing is Cassandra was born at FB but they don’t run any Cassandra clusters there anymore. You can use lots of cheap boxes but at some point the failure rate of using soo many boxes ends up killing the savings and the teams.

stubish · on Oct 9, 2022

Yes, you can run multiple nodes on a single physical server. However, then you have the additional headache of ensuring that only one copy of data gets stored on that physical server, or else you can lose your data if that server dies. Similar to having multiple nodes backed by the same storage system, where you need to ensure losing a disk or volume doesn't lose two or more copies of data. Cassandra lets you organize your replicas into 'data centers', and some control inside a DC by allocating nodes to 'racks' (with some major gotchas when resizing, so not recommended!). Translating that into VMs running on physical servers and shared disk is (was?) not documented.

rockwotj · on Oct 9, 2022

> The fun thing is Cassandra was born at FB but they don’t run any Cassandra clusters there anymore.

Isn't Intragram mostly Cassandra?

https://instagram-engineering.com/open-sourcing-a-10x-reduct...

WookieRushing · on Oct 9, 2022

It wasn’t when I last saw it. Rocksandra ended up being a stepping stone to fbs most common distributed db, zippydb https://engineering.fb.com/2021/08/06/core-data/zippydb/

Zippydb is honestly one of the best parts of fb infra. It let you select levels of consistency vs latency

petersellers · on Oct 9, 2022

> Zippydb is honestly one of the best parts of fb infra. It let you select levels of consistency vs latency

How is that different from Cassandra's Tunable consistency model?

https://cassandra.apache.org/doc/4.1/cassandra/architecture/...

shrubble · on Oct 9, 2022

Seagate introduced 2TB drives no later than 2010.

stubish · on Oct 10, 2022

Interestingly, using the highest capacity drives at any point in time would work even worse since they spun slower and slower sequential write speed. If you could get them from your preferred vendor, which seemed to be several years after introduction for us!