Episode 78: Fault Tolerance with Bob Hanmer Pt. 2
Recording Venue:
Guest(s): Robert S. Hanmer
Host(s): Markus
This is the second part of the discussion on fault tolerance with Bob Hanmer (if you didn’t listen to Episode 77, which contains part one, please go back and listen now; this episode builds on that previous one!)
We start by discussing a set of error detection patterns. Among are the well-known approaches such as checksums and voting. We then look at error recovery patterns, including restart, rollback or roll forward. The next section looks
at error mitigation patterns, which include shedding load and doing fresh work before stale. The last patterns section then looks at fault treatment patterns.
We conclude the episode with a small discussion about how to design systems using (these and other) patterns, and with some thoughts on why actually wrote the book.
Links:
- “Dependability and Its Threats: A Taxonomy” by Algirdas Avizienis, Jean-Claude Laprie and Brian Randell
- A NASA tutorial on Software Fault Tolerance
- Book Bob’s Book at Amazon
- Telecom I/O Patterns
- Bob’s Book at Wiley
- “Computers in Spaceflight: The NASA Experience” by James E. Tomayko
Podcast: Play in new window | Download
Subscribe: Apple Podcasts | RSS
Tags: embedded systems, fault tolerance, Interview, patterns
[…] SE Radio Episode 78: Fault Tolerance with Bob Hanmer Pt. 2 Share this:Like this:LikeBe the first to like this post. Read more from patterns ← TDD Cheat Sheet […]
[…] you are interested in fault tolerance in software, I suggest you listen to part 1 and part 2 of the interview with Bob Hanmer on SE […]