Recently I was asked to help prepare for the developer to SRE transition by two people, and here is the list of resources I recommended to them:

  1. The System Design Primer - I propose going through the Readme starting with a section “System Design topics start here”, and then going into examples. That way, you’ll know which components can be used as building blocks and their tradeoffs.

  2. Monitoring Distributed Systems and Service Level Objectives chapters of Site Reliability Engineering book

  3. Crack the System Design Interview

  4. Back of the Envelope Calculation for System Design Interviews

  5. Non-Abstract Large System Design from SRE Workbook is a very detailed example that thoughtfully explains the approach experienced SRE follows during the system design process

Going through these five links with enough attention should be enough to get a decent System Design knowledge fundamentals in general and prepare for the interview.

Consistent Hashing Sample Illustration
WikiLinuz, CC BY-SA 4.0, via Wikimedia Commons

Troubleshooting

Here is a solid cheatsheet on troubleshooting based on a Meta (Facebook) interview for a Production Engineer position.

SRE checklist

mxssl/sre-interview-prep-guide is a complete checklist for everything you should know as an SRE, except for the coding part. If you learn something about each point in that (a pretty extensive) list, you’re good to go for interviewing as an SRE to any Big Tech company, passing the hiring bar for technical interviews at FAANG.