From Anonymity to Ubiquity:
A Study of Our Increasing Reliance on Fault Tolerant Computing
Elwin C. Ong
Massachusetts Institute of Technology
NASA Goddard, Office of Logic Design
December 9, 2003
As the world becomes more dependent on computerized systems, the role of fault tolerance and system reliability becomes essential to the efficiency and safety of modern society. Due to their unique performance and reliability requirements, spacecraft were the first to make extensive use of fault tolerant computerized systems. Gemini, Apollo, Skylab, Pioneer, and Mariner were all early pioneers of safety-critical computing. Aircraft began the transition from stick and cable controls to digital fly-by-wire avionics in the seventies. The F-8 digital fly-by-wire program conducted at NASA Dryden borrowed technology from the Apollo program and revolutionized aircraft control. The ability to fly an aircraft now rested solely on its electronics and computers. This reliance on computers for safe flight greatly emphasized the importance of fault tolerance.
Today, fault tolerance plays a major role in guaranteeing system performance and reliability in most military aircraft as well as some commercial aircraft like Boeing's 777 and Airbus's A320/330/340 families. More recently, there has been a major push by the automotive industry to adapt drive-by-wire technology. Computerized systems are already present in most cars today including their use in ABS, engine control, transmissions, navigation and entertainment systems, but the introduction of drive by wire control (steering, braking, and acceleration) will make safety-critical computing ubiquitous to the point that every person on the road will be dependent on computers for their safety.
This presentation will introduce the role of fault tolerance in major computing systems. A literature review will be conducted, outlining some fundamental elements of the field. A comparison and discussion of the application of fault tolerance in the three safety-critical systems will follow. Aerospace systems to be discussed in addition to those already mentioned include the Space Shuttle, Hubble Space Telescope, Galileo, Voyager, Landsat 7, FAST, TRIANA, New Horizons, Cassini, and C-17. There will also be a short overview of the Time Triggered protocols TTP and Flexray to be used in automotive drive-by-wire systems.
About the Author:
Elwin Ong is a PhD. Student at the Massachusetts Institute of Technology working under Dr. Nancy Leveson. His research interest includes safety-critical, fault tolerant, real-time, and distributed systems. Mr. Ong spent the last five months at NASA Goddard's Office of Logic Design conducting research in spacecraft fault tolerance. Prior to MIT, Mr. Ong spent 4 years at Boeing Satellite Systems working on a number of programs including Galaxy 11, TDRS, and DirecTV. He has an S.M. in Aeronautics and Astronautics from MIT and a B.S. in Aerospace Engineering from UCLA.
NASA Office of Logic Design
Last Revised: February 03, 2010
Digital Engineering Institute
Web Grunt: Richard Katz