Tag: fault tolerance

Episode 203: Leslie Lamport on Distributed Systems

Filed in Episodes by on April 29, 2014 1 Comment
Episode 203: Leslie Lamport on Distributed Systems

Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex. Leslie is employed by Microsoft Research, and has recently been working with TLA+, a language that is useful for specifying concurrent systems from a high level. The interview begins with a […]

Continue Reading »

Episode 134: Release It with Michael Nygard

Filed in Episodes by on May 6, 2009 5 Comments
Episode 134: Release It with Michael Nygard

This episode is a discussion with Michael Nygard about his book “Release It” which covers aspects of software architecture you often don’t think of initially when starting to build a system. Some of the points we discussed were capacity planning, recovery as well as making the system suitable for operation in a data center.

Continue Reading »

Episode 78: Fault Tolerance with Bob Hanmer Pt. 2

Filed in Episodes by on November 23, 2007 3 Comments
Episode 78: Fault Tolerance with Bob Hanmer Pt. 2

This is the second part of the discussion on fault tolerance with Bob Hanmer (if you didn’t listen to Episode 77, which contains part one, please go back and listen now; this episode builds on that previous one!)

We start by discussing a set of error detection patterns. Among are the well-known approaches such as checksums and voting. We then look at error recovery patterns, including restart, rollback or roll forward. The next section looks
at error mitigation patterns, which include shedding load and doing fresh work before stale. The last patterns section then looks at fault treatment patterns.

We conclude the episode with a small discussion about how to design systems using (these and other) patterns, and with some thoughts on why actually wrote the book.

Continue Reading »

Episode 77: Fault Tolerance with Bob Hanmer Pt. 1

Filed in Episodes by on November 13, 2007 3 Comments
Episode 77: Fault Tolerance with Bob Hanmer Pt. 1

In this Episode we discuss fault tolerance based on the new book by Bob Hanmer. This is the actually the first part of the discussion, the remainder will be published in the next episode of SE Radio.

We start by discussing some of the context for fault tolerant systems and the imperfect world assumption. We then discuss a number of terms we will need when discussing the fault tolerance patterns. We then discuss the fault tolerance mindset and connect fault tolerance to a number of related subject areas, such as software quality. We then discuss the shared context for the patterns that follow, among them the important observation that fault tolerance does not come for free!

Finally we provide an overview over the different sections covered in the book and start the detailed discussion of the patterns by looking at the Architectural Patterns section.

The next episode will discuss the remaining patterns in the book.

Continue Reading »