In my organisation, everyone writes alert reference (handbook about how to fix an alert during oncall) in different way which creates frustration for oncall person. I am looking for a good template for alert reference. If you have any thoughts on it, kindly share please.
Problems:
- They become big as the architecture gets complex.
- They are mostly vague because noone knows all the possible causes of an alert.