Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mega-Caps suffer from the following problem:

1. There are more engineers making more divergent architectural solutions such that there is never a single place where you can make changes across the group.

2. Failures keep happening, so process is instituted with many checkboxes for engineers to work through.

3. Engineers on the small scale stuff get stack ranked against the engineers on the big scale stuff. Everyone needs to show that they can do the work and are "fungible". This leads to small internal systems having the same operational standard as large public facing systems.



I don't see what that's replying to. Nothing in that list would justify demanding that the app's team have knowledge or preference about which PCR zones to pick and which will just have to be corrected when they inevitably pick the wrong one.


The point is that every team gets to set their own failure modes. I know of multiple tier-1 services which diverge from at least one best practice.

Think of the scenario where a cloud provider needs to evacuate an az. There is no API which would allow the compute team to force migrate tens of thousands of apps and guarantee that they both are not effected and maintain their redundancy guarantees.

Internal services at google are in the same boat. However google knows about the hard edges and forces everyone to deal with all of that complexity - there is no api which the serving team could plug into which will avoid this overhead.


That still at no point requires the application's team to make decisions about which two PCR zones to pick and which cells within it to pick, which [decision] can still be cleanly abstracted away, and would still be a mixing of unrelated concerns, and so your comments are still orthogonal to the point I was bringing up here.

Edit: It might help to check out my comment here, where I clarify what a dev should vs shouldn't have to worry about: https://news.ycombinator.com/item?id=29085638


While what you say is true, I think GP is ultimately correct. You can have a system define a convention and allow bypassing it, instead of forcing everyone to start from scratch. In fact, this is the approach that pretty much any modern service at Google will use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: