Tag: sre

Episode 548: Alex Hidalgo on Implementing Service-Level Objectives

Filed in Episodes by on January 25, 2023 0 Comments
Episode 548: Alex Hidalgo on Implementing Service-Level Objectives

Alex Hidalgo, principal reliability advocate at Nobl9 and author of Implementing Service Level Objectives, joins SE Radio’s Robert Blumen for a discussion of service-level objectives (SLOs) and error budgets. The conversation covers the meaning of a service level; service levels and product ownership; the pervasive nature of imperfection; and why trying to be perfect is […]

Continue Reading »

Episode 544: Ganesh Datta on DevOps vs Site Reliability Engineering

Filed in Episodes by on December 28, 2022 0 Comments
Episode 544: Ganesh Datta on DevOps vs Site Reliability Engineering

Ganesh Datta, CTO and cofounder of Cortex, joins SE Radio’s Priyanka Raghavan to discuss site reliability engineering (SRE) vs DevOps. They examine the similarities and differences and how to use the two approaches together to build better software platforms. The show starts with a review of basic terms; definitions of roles, similarities and differences; skillsets […]

Continue Reading »

Episode 534: Andy Dang on AI / ML Observability

Filed in Episodes by on October 20, 2022 0 Comments
Episode 534: Andy Dang on AI / ML Observability

Andy Dang, Head of Engineering at WhyLabs discusses observability and data ops for AI/ML applications and how that differs from traditional observability. SE Radio host Akshay Manchale speaks with Andy about running an AI/ML model in production and how observability is an important tool in diagnosing and detecting various failures in the application. They explore […]

Continue Reading »

Episode 415: Berkay on Incident Management

Filed in Episodes by on June 30, 2020 0 Comments
Episode 415: Berkay on Incident Management

Berkay Mollamustafaoglu, founder of Ops Genie, discusses the keys to an effective incident management process.  Many aspects of incident management are counterintuitive. Why does increasing the rate of change increase uptime?  Why is culture the most important thing to get right? Why is having zero incidents not a goal to aim for? SE Radio host […]

Continue Reading »

SE-Radio Episode 276: Björn Rabenstein on Site Reliability Engineering

Filed in Episodes by on December 6, 2016 2 Comments
SE-Radio Episode 276: Björn Rabenstein on Site Reliability Engineering

Björn Rabenstein discusses the field of Site Reliability Engineering (SRE) with host Robert Blumen. The term SRE has recently emerged to mean Google’s approach to DevOps. The publication of Google’s book on SRE has brought many of their practices into more public discussion. The interview covers: what is distinct about SRE versus devops; the SRE […]

Continue Reading »