|
|
|
Autonomic Computing via Dynamic Self-Repair of Hardware Faults
|
|
| The goal of this research project is to develop computer systems that use dynamic self-repair to enable autonomic operation in the presence of permanent hardware faults. Autonomic operation is crucial for reliability when mission-critical computer systems are deployed for a long time with little or no opportunity for human repair. Existing solutions for tolerating general classes of hard faults, such as triple modular redundancy, are very expensive-in terms of power and hardware cost. In this research project, we seek to develop lightweight hardware techniques for self-repair that can achieve high reliability and robust performance without consuming vast amounts of hardware or power. Our first step in this direction, called Self-Repairing Array Structures (SRAS), masks hard faults in microprocessor array structures. Our experimental evaluation of SRAS shows that it achieves high reliability with far less power consumption than processor-level redundancy. Our current research is exploring Hierarchical Modular Redundancy (HMR), a self-repair scheme for tolerating faults in more general structures than just arrays. Ongoing and future research will further develop SRAS and especially HMR, including efficient hardware implementations and thorough experimental evaluations. We will also explore a range of computing systems other than just microprocessors, including embedded processors and network processors. As part of this project, we will develop significant simulation infrastructure for evaluating these research ideas and for modeling the effects of emerging fault models. |
|
|