3 hours of downtime per year equals to 99,96% uptime. In what world is that a lo...

dijit · on Aug 31, 2019

Reliability is weird, you're only as reliable as the sum of all your critical components.

Usually you strive for "five 9's" in infrastructure, obviously there's a lot of wiggle room depending on business case. But reliability for individual components gets exponentially harder with each 9 after the first 2.

99.96% uptime of a datacenter is shockingly low, taking connection issues into account (IE; number of successful inbound packets vs unsuccessful ones, not just served requests). For context my company has around 15 datacenters around the world which routinely hit 5-9's, with only a few issues of datacenters being down for 2-3 minutes during a particularly bad ISP outage.

The overwhelming majority of degradations are ones related to bad code being deployed. But since reliability is a sum of all components availability it follows that permitting more outages is less preferable. Especially since they affect all or at least the majority of components in a given region.

rosser · on Aug 31, 2019

In a world where you have SLAs with your customers, in which you commit to something better?

tilolebo · on Aug 31, 2019

Damn, these ships must really be run tightly.

In every company I have worked for, the amount of outages caused by bugs and other post deployment issues was already above that number.