Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What you describe is best practice for older versions of Cassandra with older versions of the Oracle JVM on spinning disks. And at this time Apple already had a massive amount of Cassandra nodes. Back when 1TB disks were what we had started buying for our servers. Cassandra was designed to run on large numbers of cheap x86 boxes, unlike most other DBs where people had to spend hundreds of thousands or millions of dollars on mainframes and storage arrays to scale their DBs to the size they needed.

Half a TB per node, which during regular compaction can double. And if you went over, your CPU and disk spent so much time on overhead such as JVM garbage collection that your compaction processes backlog, your node goes slower and slower, your disk eventually fills up, and it falls over. Later things got better and you could use bigger nodes if you knew what you were doing and didn't trip over any of the hidden bottlenecks in your workload. Maybe even fixed in the last few versions of Cassandra 3x and 4.0.



What psaux mentioned makes more sense. A node == one Cassandra agent instead of a server.

Past 100k servers you start needing really intense automation just to keep the fleet up with enough spares.

If you’ve got say 10k servers it’s much more manageable

The fun thing is Cassandra was born at FB but they don’t run any Cassandra clusters there anymore. You can use lots of cheap boxes but at some point the failure rate of using soo many boxes ends up killing the savings and the teams.


Yes, you can run multiple nodes on a single physical server. However, then you have the additional headache of ensuring that only one copy of data gets stored on that physical server, or else you can lose your data if that server dies. Similar to having multiple nodes backed by the same storage system, where you need to ensure losing a disk or volume doesn't lose two or more copies of data. Cassandra lets you organize your replicas into 'data centers', and some control inside a DC by allocating nodes to 'racks' (with some major gotchas when resizing, so not recommended!). Translating that into VMs running on physical servers and shared disk is (was?) not documented.


> The fun thing is Cassandra was born at FB but they don’t run any Cassandra clusters there anymore.

Isn't Intragram mostly Cassandra?

https://instagram-engineering.com/open-sourcing-a-10x-reduct...


It wasn’t when I last saw it. Rocksandra ended up being a stepping stone to fbs most common distributed db, zippydb https://engineering.fb.com/2021/08/06/core-data/zippydb/

Zippydb is honestly one of the best parts of fb infra. It let you select levels of consistency vs latency


> Zippydb is honestly one of the best parts of fb infra. It let you select levels of consistency vs latency

How is that different from Cassandra's Tunable consistency model?

https://cassandra.apache.org/doc/4.1/cassandra/architecture/...


Seagate introduced 2TB drives no later than 2010.


Interestingly, using the highest capacity drives at any point in time would work even worse since they spun slower and slower sequential write speed. If you could get them from your preferred vendor, which seemed to be several years after introduction for us!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: