ON-BOARD DEVICE AND SYSTEM ARCHITECTURES WITH
THE VERSION-THRESHOLD ADAPTATION TO HARDWARE AND SOFTWARE FAULTS
V.S. Kharchenko1, V.V. Sklyar2
1) Department of Computer Systems and Networks, National Aerospace University named after N.E. Zhukovsky "Kharkiv Aviation Institute", Ukraine
Address: 32 apart, 35-b Astronomicheskaya str., Kharkiv, Ukraine, 61085
e-mail: khaks@skynet.kharkov.com
2) Kharkiv Military University, Ukraine
Keywords: on-board device and system reliability, multiversity approach, hardware and software faults, fault-tolerant architectures, version-threshold adaptation1. Introduction. Real-time on-board devices and systems (RBDS) reliability is ensured by the ability to perform specified functions under hardware and software faults. Its possible if to use of multiversity approach based on the introduction of software or (and) hardware redundancy and means of checking, diagnosis and reconfiguration [1-4]. The problem of analysis and synthesis of multiversion BRDS (MBRDS) is connected with research of architecture adaptation algorithms [5,6]. That is why, it is required to solve the following, in our opinion most important, tasks in order to assess reliability of MBRDSs and to choose optimal architecture.
1. The method of MBRDS adaptation to hardware and software should be proposed.
2. Algorithms of adaptation and architectures of MBRDSs should be developed and systemized.
3. It is necessary to improve the procedure of MBRDS reliability assessment, to research the different fault-tolerant MBRDS architectures and to formulate the recommendations for their choice.
The goal of paper is development of adaptive MBRDS models and research of reliability of architectures with version-threshold adaptation (VTA). The subject of study is limited to majority MBRDSs, since methods of adaptation could be presented in them wider. Majority multiversion architectures are realized in computer control systems of aerospace complexes, nuclear power plants, and other critical applications [1,3,5,6].
2. Methods of adaptation of multiversion real-time systems. Threshold adaptation (TA), i.e. adaptation by means of a change of majority element response threshold (the majority element passes from the operation scheme «2 out of 3» to the scheme «1 out of 3»), is used in one-version real-time systems (OBRDS). Besides, a system with threshold adaptation could be partially adaptive or fully adaptive; lets call such a system hereinafter just an «adaptive» one. In case of an adaptive system, if one of the channels fails, the system continues to operate in the two-channel mode before one of two channels fails. After the second channel failure the system passes into the one-channel mode. If one of the channels fails under partial adaptation, the system passes immediately into the one-channel mode of operation. Besides, the two-channel mode of operation is intermediate. A more complex variant of adaptation, version-threshold adaptation, could be used in multiversion systems. The adaptation algorithm becomes more complex due to the fact that it is necessary to take into account failures of both hardware and software in every channel. There could be three types of adaptation: VTA1 (both hardware and software adaptation), VTA2 (software adaptation only) and VTA3 (hardware adaptation only).
2.1. Method VTA1. In case of VTA1, it is necessary to organize storage of all three software versions in the primary hardware component of each of three channels. VTA1 requires a unified operation system for all three channels. Three variants are possible in case of partial VTA1:
1) full partial partial adaptation both by software versions and hardware channels; when one channel fails, the system goes into the one-channel mode of operation;
2) hybrid adaptation partial by one component and full by the other one:
2a) partial adaptation by hardware channels and a full one by software versions (partial VTA3, full VTA2). In case of software version failures, the algorithm of a full version adaptation (VTA2) is realized; in case of hardware channel failures, the algorithm of a partial channel adaptation (partial VTA3) is realized;
2b) full adaptation by hardware channels and partial adaptation by software versions (full VTA3, partial VTA2). In case of the failure of one of the software versions or one of the hardware channels, the algorithm of partial version adaptation (partial VTA2) is realized and the system passes into the one-version mode of operation; the full adaptation (full VTA3) is realized in this mode when hardware channels fail.
2.2. Method VTA2. In order to realize this type of adaptation, all three software versions should be stored in the primary hardware component of each of the channels. A unified operation system for all three channels is required in case of VTA2. The channel failure occurs both in case of hardware and software failures. The system allows only a single failure of hardware channels. In case of failure of hardware and software of one of the channels, the system with partial VTA2 passes into the one-version mode of operation.
2.3. Method VTA3. If the channel fails, the software version is not used any more.
In case of partial adaptation the system operates either in the three-channel or one-channel mode. Thus, the system passes into the one-channel mode in case of both hardware and software failure of any channel.
3. The graph-event method of MBRDS reliability assessment. The method includes the following stages.
3.1. Analysis of the adaptation algorithm and building up the graph of transitions between the system statuses. Algorithm of operation of the fault-tolerant real-time system is represented in the form of the oriented graph. Graph nodes correspond with various system statuses and graph edges correspond with possible ways of transition between these statuses. The graph has one initial node that corresponds with the initial status of the system when all software versions and hardware channels are in the up state. Failure of one of the software versions or one of the hardware channels may occur at any moment of the system operation. That is why, there are two edges at each node that corresponds with the up state status of the system. The system failure occurs after a certain number of failures of software versions and hardware channels. Deadlock vertexes of the graph correspond with invalid statuses of the system. Besides, partially adaptive systems have intermediate states, in which the system passes into the one-channel (one-version) mode of operation in accordance with the realised algorithm.
3.2.Building up an event model of transitions between the system statuses. All statuses of the majority redundant system are multiple of three in the standard combinatorial probability model. Failures of hardware and software components are asymmetric to each other in MBRDS, i.e. a split of the graph nodes occurs and the same node can have both up state and invalid states. We should consider all possible combinations of failures of hardware and software components for such nodes. System statuses and transitions between them correspond with nodes and edges of the graph. Symmetrical statuses of the system do not require detailing, that is why they are represented in one-event diagram. In case of asymmetric statuses, it is necessary to sort out all possible combinations of failures of software versions and hardware channels, that is why, they are represented in three-event diagram.
3.3. Identification of probabilities of the system being in up state statuses. Each of the up state statuses is considered to have a probability of the system to be in that status. In case of symmetrical statuses, multiplier values are equal to three, for others they are determined on the basis of the event diagrams.
3.4. Calculation of probabilities non-failures (PNF) of the system. PNF of MBRDS is equal to the sum of probabilities of the system being in up state. That is why it is necessary to sum up all probabilities obtained at the third stage and transform the formula in such a way that it is convenient for conducting calculations. Then it should be transformed with consideration of the complete component model of MBRDS. Each of reliability block diagrams taking into account this component model could be divided into three parts:
1) part SW including separate reservation of functional software modules (SWFi);
2) part HW including common reservation of hardware components storage of operation system (HW1S), storage of SWFi (HW1Fi), processing units (HW2) or hardware components (HW1S, HW1Fi, HW2) and operation system versions (SWS);
3) part ME including non-reserved majority element consisting of the components SWM, HW1M, HW2M and operation system SWS or ME components.
4. Conclusion. Results of architectures with VTA analysis and practical application. Analysis of MBRDS reliability is hampered by the fact that reliability block diagrams have to be specified for each of the variants of component failures. To overcome these difficulties, the a graph-event method of MBRDS reliability assessment is developed. Calculation of PNF of MBRDSs with various types of adaptation showed that the systems with VTA1 haves the most reliable characteristics. It should be noted that in this case the selected values of hardware failure rates are real, then achievement of the failure rate of about 10-5 1/hour for software components is actually on the verge of (and sometimes beyond) the possibilities of the modern technologies of programming.
The set of the considered architectures of the OBRDSs and MBRDSs with the version-threshold adaptation should be added to the reconfiguration algorithms of software and hardware components, similar to algorithms for multilevel majority systems [5,6]. The methods of the version-threshold adaptation, technique of assessment and choice of fault-tolerant system architectures were used under aerospace processing systems development. It allowed appreciably to reduce probability of system failures. The proposed architectures, adaptation methods, and the software reliability models database [7] are a base for development of the system of automated designing of fault-tolerance MBRDSs.
References
1. Laprie J.-C. Dependability Handbook, LAAS, Report ?98-346, 1998, 365 p.
2. Choi C.Y., Johnson B.W., Profeta III J.A. Analysis of Dependable Architectures, IEEE Transaction on Reliability, vol.46, n 3, 1997, pp.316-322.
3. Kharchenko V.S. Methods of an Estimation of Multiversion Safety Systems, Proceeding of the 17th International System Safety Conference, Orlando, FL, Aug. 16-21, 1999, pp. 347-352.
4. Kharchenko V.S.,?arasenko V.V. Multiversion Design Technologies of On-board Fault-tolerant FPGA Devices // Proceedings of MAPLD conference, 13-15 September, 2001, Maryland, USA, http:// www.klabs.org.
5. Kharchenko V.S. ??dels and Algorithms of Adaptive Multilevel Majority Computer-Based Control Systems Reconfiguration// ?utomatics and Telemechanics.-2000.-?12.-Pp.89-101. (in Russian)
6. Kharchenko V.S., Zenin A.P., Sklyar V.V. ??thods of Multi-Parameter Adaptation Control Systems with Majority Reservation// Space Science and Technologies.- 1999.- V.5.- ?5/6.- Pp.81-91. (in Ukraine)
7. Kharchenko V.S., Sklyar V.V., Vilkomir S.A. Choice of Software Reliability Models for Safety-Critical Systems// Control Systems and Computers.-2000.-?3.-Pp.28-39. (in Ukraine)