I've done the path from SWE to SRE and back to SWE. I was always happy to do production support and diagnose and fix production problems, so I naturally moved to SRE which is always looking for people.
It was a real mistake, SRE is hugely stressful and really unrewarding compared to SWE. Yes you learn some skills and get some occasional glory, but year after year of fighting fires really didn't build any long lasting career.
After switching back to SWE I've finally got promotions and pay rises again, as well as good night sleep and much less stress.
And for anyone still considering SRE: when interviewing, ask how incentives/bonuses/promotions work for software engineers and how does that compare to SREs. A lot of promotable activities for SWE (shipping new things constantly) have negative value for an SRE role, since continually changing infrastructure ensures on-call never develops mastery of those systems.
Back to the OP, I raise a glass to your sabbatical. Most SREs end up needing a healing period from repetitive stress injury (AKA burnout).
If I may offer some completely unsolicited advice, don't put too much pressure on yourself in the next few months. People who gravitate towards SRE work tend to thrive under short-term ambiguity and emergency/urgency. However, long/medium term ambiguity without a clear productive goal can quickly feel like a crisis. OP mentions this in closing, so I'm rooting for them to rest and "sit still" for a bit.
I've been running as SRE for the last decade, running critical stuff like authentication (which mostly all services depend on). I'm a software engineer.
I cannot disagree more: our team is healthy, oncall is quite a fine activity to do (and compensated, of course), we have plenty of engineering work to do.
I've had five promotions (and tripled salary) and done so working on plenty rewarding activities over time. I've done from deployment automation, to capacity planning, distributed system design, large data migrations, designing ietf standards for auth protocols, wrote client sdks, now we even do AI for different things (including model development).
I'd recommend to not generalize from "I didn't like it / the experience wasn't a match for me" to "the role is shitty".
> oncall is quite a fine activity to do (and compensated, of course),
Overnight on call is never compensated. I know some tech companies pay but I've never seen it.
> deployment automation, to capacity planning, distributed system design, large data migrations, designing ietf standards for auth protocols, wrote client sdks, now we even do AI for different things (including model development).
To me that is mostly SWE work (capacity planning and migrations perhaps is SRE). in regulated environments SREs are explicitly forbidden from making changes to the code base.
> I'd recommend to not generalize from "I didn't like it / the experience wasn't a match for me" to "the role is shitty".
I started in basic IT, worked into SRE, and once I had established an understanding of it, immediately did whatever I could to transition into full time SWE. I went in hoping it would be cool coding to automate infrastructure, but it felt mostly like being tier 4 support.
I miss working in software that also utilizes my skills in infrastructure, but I do not miss the constant escalations, terrible on-call schedules, and only about 20% of my time being spent on the rewarding parts of the job.
Thanks for recognizing what SRE/SysAdmins have to deal with all the time.
I think in general that type of experience would help Developers be more empathetic to the operations side of things. Those fires often come from trade-offs made in development.
Yeah if you're a SRE or admin good luck. I think some companies are much better at looking after you than others so make sure you find somewhere good.
I think it made me a better developer because I've seen a lot of what can go wrong. Probably reduces my productivity but ultimate my stuff is more likely to work.
Yeah, that's the essential mechanic of it. When "stuff is more likely to work" then everyone can build upon it.
Though at some point(s) that stable foundation needs updating too, so new stuff can be built upon it. That's where the choosing the right balance for the right pieces needs figuring out.
Maybe you should clarify what company you worked in, because that's clearly not Google. In Google, SREs do 12 hour shifts. Night shifts (or rather 24 hour shifts) are (ironically enough) usually done by SWE teams. Almost every SWE team in Cloud has a 24 hour shift, and I agree, they are quite terrible.
I've had 12 hour shifts at Google with 200 pages over a week (shitty monitoring, shitty capacity constraints, shitty management). That was quite terrible.
SRE is an anti pattern that Google is unwilling to admit and is selling books on. Just like there should not be QA, release engineering, continuing engineering or DBA as separate departments/job titles, because these critical parts of software development should not be considered optional and thrown over the wall to take care of by someone with no stake in developing the product.
I've been one the other side of this (i.e. companies that have no SREs or QA, or in one case a company that had QA and got rid of it) and it has always been an unmitigated disaster.
The root cause of this disaster is that, when writing software, interruptions are the death of productivity. Having a software engineer wear too many different hats at one time, especially when some of those hats are largely real-time interrupt driven, can absolutely kill productivity.
To emphasize, I'm not at all in favor of "throwing things over the wall". Software engineers are responsible, for example, for making software that is easy to test and has good observability in place for when production problems show up. But just because you listed a bunch of things that are "critical for software development" doesn't mean that one person or role should be responsible for all of these things.
At the very least, e.g. for smaller teams I recommend that there is a rotating role so devs working on feature development aren't constantly interrupted by production issues, and instead each dev just gets a week-long stint where all they're expected to do is work on support issues and production tooling improvements.
I agree very much with you that interruptions are death of productivity. Your suggestions for weekly rotations are great.
However, I argue that if the engineers are interrupted by QA issues, they will be motivated to find ways to not have those QA issues. In absence of that, we end up with the familiar “feature complete, let QA find bugs” situation.
> However, I argue that if the engineers are interrupted by QA issues, they will be motivated to find ways to not have those QA issues.
There are institutional limitations that engineers cannot overcome, no matter how zealous or motivated. Moreover, companies also ought to remember that engineers can "find ways to not have those QA issues" by seeking employment elsewhere!
This is such a crazy take for me. Any profession that matures eventually specializes. I wouldn’t expect the same person to pour my foundation, install the plumbing, and wire the building. Yet in an ever expanding field we expect someone to be able to do it all. Also saying people who don’t code have no stake so egocentric.
Pouring foundation, installing plumbing and wiring the building is specialized by the physical necessity of these activities, which cannot be repeated without mistake at a great cost. That justifies specialization. Unlike building a bridge, compiling software is essentially free. QA, release engineering and database design can and should be repeated and iterated on by software engineers, because it is a necessary part of the development and removing it from the expected work distorts incentives.
Regardless of these fields being separate departments/job titles: people are not getting promoted for doing QA, release engineering, continuing engineering, or DBA work. It's a huge cultural problem in tech.
Only having full stack engs taking care of everything works but only until a certain org size (like in a small startup). Once the org gets larger/systems get more complex, you usually need specialisation. It’s natural, and Google didn’t really invent anything here
It was a real mistake, SRE is hugely stressful and really unrewarding compared to SWE. Yes you learn some skills and get some occasional glory, but year after year of fighting fires really didn't build any long lasting career.
After switching back to SWE I've finally got promotions and pay rises again, as well as good night sleep and much less stress.