Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

3 hours of downtime per year equals to 99,96% uptime.

In what world is that a lot of downtime?



Reliability is weird, you're only as reliable as the sum of all your critical components.

Usually you strive for "five 9's" in infrastructure, obviously there's a lot of wiggle room depending on business case. But reliability for individual components gets exponentially harder with each 9 after the first 2.

99.96% uptime of a datacenter is shockingly low, taking connection issues into account (IE; number of successful inbound packets vs unsuccessful ones, not just served requests). For context my company has around 15 datacenters around the world which routinely hit 5-9's, with only a few issues of datacenters being down for 2-3 minutes during a particularly bad ISP outage.

The overwhelming majority of degradations are ones related to bad code being deployed. But since reliability is a sum of all components availability it follows that permitting more outages is less preferable. Especially since they affect all or at least the majority of components in a given region.


In a world where you have SLAs with your customers, in which you commit to something better?


Damn, these ships must really be run tightly.

In every company I have worked for, the amount of outages caused by bugs and other post deployment issues was already above that number.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: