Skip all navigation and jump to content Jump to site navigation Jump to section navigation.
NASA - National Aeronautics and Space Administration
+ Visit NASA.gov
AISRP logo
ABOUT AISRP PROGRAM MANAGEMENT PROJECTS RESULTS
Earth Sun System Sun Solar System Universe Exploration Computational Science
Universe
Index
Next
Previous
Started:08/23/2007
Last Report:8/19/2008
2008 Workshop Presentation
PI: Hillol Kargupta
University of Maryland Baltimore County

Distributed and Peer-to-Peer Data Mining for Scalable Analysis of Data from Virtual Observatories
Design, implementation, and archiving of very large sky surveys play a critical role in today's Astronomy research. However, astronomers will be unable to tap the riches of this collection of gigabyte, terabyte, and (eventually) petabyte catalogs without a computational backbone that includes support for queries and data mining across distributed virtual tables of de-centralized, joined, and integrated sky survey catalogs. Moreover, use of local data management systems such as MyDB, MySpace in AstroGrid, and Grid Bricks for storing and managing user's local data is becoming increasingly popular. This is opening up the possibility of constructing a Peer-to-Peer (P2P) network for data sharing and mining. This document proposes research and development for a new generation of scalable data analytic services for the NVO based on advanced distributed and P2P data mining capabilities across multiple data repositories. This research will develop technology for supporting web services within the NVO that will allow astronomy researchers to analyze data from multiple surveys using fundamentally distributed algorithms. It will also develop several distributed data mining algorithms for analysis of distributed Astronomy catalogs without requiring the data to be downloaded and centralized. Specific objectives include the following items: (1) The project will design and implement distributed algorithms for computing statistical primitives, principal component analysis, and outlier detection from distributed Astronomy catalogs and their partial images stored in users' local data management systems. These algorithms will be able to analyze data without requiring source catalogs to be downloaded and centralized. (2) The project will develop a prototype system which will offer a rich collection of web-services based on various DDM algorithms. This service offers a novel augmentation to the existing NVO environment and it will support a rich variety of data mining tasks that will work in a distributed fashion. (3) The developed system will be tested using specific astronomical research problems. In particular, we will explore the multi-dimensional multi-wavelength parameter space of astrophysical properties of starbursting galaxies. We will search for unusual correlations, outliers, sub-clusters, and fundamental planes within the multi-dimensional parameter space presented by several large surveys. To carry out this research we will set up a simulated distributed Astronomy catalog environment in the lab using data from publicly available source catalogs. Our system will be benchmarked according to speed, communication cost, and accuracy. Accuracy will be validated within the context of the Astronomy problem described above. This research is directly relevant to the AISR program for several reasons. First, it enables increased productivity of NASA's Science Mission Directorate (SMD) research endeavors through rapid multi-mission correlative analysis, such as distributed and P2P mining of the large survey catalogs from GALEX, Spitzer, 2MASS, and eventually WISE. Second, this research involves an interdisciplinary team of researchers in Astrophysics (Borne), Database Technologies(Kargupta, Giannella), Distributed Systems (Kargupta), and Distributed Data Mining (all). Dr. Borne is a senior member of the NVO project. We also have strong support from NASA's Space Science Data Operations Office (see attached supporting letter) regarding this proposed collaborative project and its transition to the practice. Finally, this project explicitly demonstrates the relevance, applicability, and potential impact of emerging information technologies to SMD missions and programs.

FirstGov logo + NASA Privacy, Security, Notices NASA Curator: AISRP Curator
NASA Official: Joseph H. Bredekamp
Last Updated: 01/18/2005