Episode 78: Fault Tolerance with Bob Hanmer Pt. 2

Filed in Episodes by on November 23, 2007 3 Comments

Recording Venue:
Guest(s): Robert S. Hanmer

Host(s): Markus
This is the second part of the discussion on fault tolerance with Bob Hanmer (if you didn’t listen to Episode 77, which contains part one, please go back and listen now; this episode builds on that previous one!)

We start by discussing a set of error detection patterns. Among are the well-known approaches such as checksums and voting. We then look at error recovery patterns, including restart, rollback or roll forward. The next section looks
at error mitigation patterns, which include shedding load and doing fresh work before stale. The last patterns section then looks at fault treatment patterns.

We conclude the episode with a small discussion about how to design systems using (these and other) patterns, and with some thoughts on why actually wrote the book.


Tags: , , ,

Comments (3)

Trackback URL | Comments RSS Feed

Sites That Link to this Post

  1. Fault Tolerance | Fragmented Thoughts | February 29, 2012
  2. omega tau » 100 – System Health Management | August 3, 2012
  1. ChrisKey says:

    Aimed at the novice as well as the experienced practitioner, the books ultimate goals is to provide you with proven techniques – in the form of patterns – to make programs less failure-prone when executing. (excerpt from a site describing his book)
    anyone read his book? sounds like a pretty incredible achievement

    Thanks for the cast.
    Legal Jobs

Leave a Reply

Your email address will not be published. Required fields are marked *