Tag: fault tolerance

SE-Radio Episode 264: James Phillips on Service Discovery

Filed in Episodes by on August 3, 2016 1 Comment
SE-Radio Episode 264: James Phillips on Service Discovery

Charles Anderson talks with James Phillips about service discovery and Consul, an open-source service discovery tool. The discussion begins by defining what service discovery is, what data is stored in a service discovery tool, and some scenarios in which it’s used. Then they dive into some details about the components of a service discovery tool […]

Continue Reading »

SE-Radio Episode 246: John Wilkes on Borg and Kubernetes

Filed in Episodes by on January 7, 2016 3 Comments
SE-Radio Episode 246: John Wilkes on Borg and Kubernetes

John Wilkes from Google talks with Charles Anderson about managing large clusters of machines. The discussion starts with Borg, Google’s internal cluster management program. John discusses what Borg does and what it provides to programmers and system administrators. He also describes Kubernetes, an open-source cluster management system recently developed by Google using lessons learned from […]

Continue Reading »

Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Filed in Episodes by on March 6, 2015 3 Comments
Episode 222: Nathan Marz on Real-Time Processing with Apache Storm

Nathan Marz is the creator of Apache Storm, a real-time streaming application. Storm does for stream processing what Hadoop does for batch processing. The project began when Nathan was working on aggregating Twitter data using a queue-and-worker system he had designed. Many companies use Storm, including Spotify, Yelp, WebMD, and many others. Jeff and Nathan […]

Continue Reading »

Episode 203: Leslie Lamport on Distributed Systems

Filed in Episodes by on April 29, 2014 3 Comments
Episode 203: Leslie Lamport on Distributed Systems

Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex. Leslie is employed by Microsoft Research, and has recently been working with TLA+, a language that is useful for specifying concurrent systems from a high level. The interview begins with a […]

Continue Reading »

Episode 134: Release It with Michael Nygard

Filed in Episodes by on May 6, 2009 5 Comments
Episode 134: Release It with Michael Nygard

This episode is a discussion with Michael Nygard about his book “Release It” which covers aspects of software architecture you often don’t think of initially when starting to build a system. Some of the points we discussed were capacity planning, recovery as well as making the system suitable for operation in a data center.

Continue Reading »