Software Reliability Engineering (SRE)

Created in the Google context, SRE teams are responsible for product reliability engineering, including performance, availability, monitoring, and emergency response [1]. Although SRE teams can use 50% of their time on operational tasks, SRE engineering is expected to reduce toil work to almost zero. Another SRE competence is developing tools for enhancing the autonomy of product teams in providing reliable systems.

In some companies, people from operations staff are called SREs, just as other organizations call them DevOps.


[1] Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016. Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media.