Skip all navigation and jump to content Jump to site navigation Jump to section navigation.
NASA - National Aeronautics and Space Administration
+ Visit NASA.gov
AISRP logo
ABOUT AISRP PROGRAM MANAGEMENT PROJECTS RESULTS
Earth Sun System Sun Solar System Universe Exploration Computational Science
Computational Science
Index
Next
Previous
Started:06/10/2002
Reports
Report:9/5/2007
Report:12/22/2003
2005 Workshop
PI: Michael Warren
Los Alamos National Laboratory

Software Technology to Enable Reliable High-Performance Distributed Disk Arrays

The advent of commodity microprocessors with adequate floating-point performance and low-priced fast Ethernet switches contributed to the emergence of Beowulf clusters in the mid-90s. We are currently poised for a similar advance in distributed disk arrays (DDAs), due to the dramatic decline in the price of commodity disk drives. The implementation of reliable DDAs will revolutionize data storage and retrieval in practically all area of NASA information science. The cost per Gbyte for disk storage is currently less than$2.00. Several groups (including ours) have demonstrated fault-tolerant terabyte servers for a total cost of under $4000. Used in a parallel cluster environment, multi-terabyte disk arrays with achievable read/write bandwidths that greatly exceed available Gigabit local and wide-area networking technology are possible. Additionally, the greater CPU/storage ratio in DDA offers techniques which are not possible in traditional RAID arrays.

While projects such as the parallel virtual file system have demonstrated clear untility, they lack the fault-tolerance that could be obtained via the efficient calculation and storage of parity or mirroring information between nodes (analogous to RAID techniques within a node). This additional functionality would add orders-of-magnitude to the reliability of mass storage on clustered systems. Also, while disk areal density has been improving at an annual rate of about 60% per year, disk latency has been improving 10%, so disks are becoming increasing unbalanced in terms of capacity and latency. By intelligently replicating and caching data in a DDA, it is possible reduce latencies to access terabytes or more of data by an order of magnitude.

Our objectives are to:

  • Build upon software techniques such as the RAID-x architecture at USC and the xFS project at Berkeley to create high-performance fault-tolerant DDA.
  • Demonstrate the ability to trade capacity for reduced latency in DDA using software-only disk head prediction mechanisms which work on a wide range of off-the-shelf hard drives.
  • Demonstrate techniques for cheaply and efficiently duplicating and transporting multi-terabyte datasets.
  • Address limitations in the transmission-control protocol which limit bulk data transfers.
  • Validate the reliability and performance of our tools by applying them to a number of on-going astrophysics and data science projects at LANL.

Bibliography
The Space Simulator: Modeling the Universe from Supernovae to Cosmology ; (2003)

FirstGov logo + NASA Privacy, Security, Notices NASA Curator: AISRP Curator
NASA Official: Joseph H. Bredekamp
Last Updated: 01/18/2005