|
|
|
Development of a Data Federation Facility for the National Virtual Observatory
|
|
| Astronomy is entering a new, information-rich era as multiple, large area, digital sky surveys, including many NASA missions, are or will soon be in production. The resulting datasets are truly remarkable in their own right; however, a revolutionary step arises in the aggregation of complimentary multi-wavelength datasets (i.e., the cross-identification of billions of sources). In fact, before any advanced data exploration or mining tools can be employed, the data of interest must be federated. Indeed, this data federation service is one of the primary requirements for the National Virtual Observatory (NVO). Federating these different datasets, however, is a challenging task. In this proposal, we focus on identifying solutions to the problems inherent in the dynamic, multi-wavelength cross-identification of large numbers of Astronomical sources.
These problems arise from both a computational science side (data volumes algorithmic complexities, persistence mechanisms), as well as from the astronomical side (variations in physical phenomena between multiple wavelength observations). In this proposal, we present our plan for developing a robust data federation service for the forthcoming NVO, which includes three separate sub-projects whose own utility exceeds the combined goal. First, we propose to develop AstroForge, a distributed collaborative development facility for scientific computing software projects based on the extremely popular SourceForge.net open source software development website. Second, we will develop a cross-identification toolkit, hosted by AstroForge, to facilitate open discussion and development, that will utilize probabilistic associations to federate disparate astronomical data sets. Finally, we will integrate the data-federation service with a data-mining service on the separately funded NVO framework as a demonstration facility. Different integration mechanisms will be explored, including the Web Service model, the Peer-to-peer model (e.g., using JavaSpaces or JXTA), and direct database integration (e.g. using SQL and Stored Procedures). All of the results from this project, including technical evaluations, will be made publicly available to the community. |
|
|