|
|
|
Advanced Astrophysical Algorithms to Novel Supercomputing Hardware
|
|
| Astronomy has become a data-rich field, and the future promises to only increase the data volume. NASA has a long, and distinguished track record of funding scientifically important missions that are in large part contributing to the quantity and quality of data being analyzed to gain fundamental insights into everything from our own planet to the origins of our universe. Our ability as scientists, however, to keep pace with this data flood is not sufficient. In response, many researchers, some in proposals to this AISR program, have developed innovative, cutting-edge statistical and computer science techniques and tools to improve our ability to analyses the data in hand. Even these efforts have not been adequate; we somehow must find mechanisms for improving the performance and robustness of our applications; otherwise we will not only fail to capitalize on the richness of the data we currently have in hand, but we also will fall further behind with the imminent arrival of petabyte datsets. This problem is not tied to one particular branch of NASA�s operating strategy, but crosses all scientific and engineering disciplines.
In this proposal, we outline an interdisciplinary collaboration that we feel has the potential to revolutionize the way in which complex algorithms are applied to astrophysical datasets. Our collaboration of domain specialists have identified a means by which the analysis of large datasets can be increased by a factor of 100 or more over that provided by the traditional use of supercomputers alone. We will combine the expertise of the Laboratory for Cosmological Data Mining, the NCSA Innovative Systems Laboratory (ISL) and the NCSA Automated Learning Group (ALG) to analyze terascale datasets, such as that from the Sloan Digital Sky Survey (SDSS). As an initial proof-of-concept, we will produce the largest and most accurately classified catalog of objects to date, using instance-based learning, and, subsequently use this classified catalog to calculate cosmological statistics such as N-point correlation functions. Both of these data analyses, using the most accurate statistical approaches are otherwise intractable. The ISL has available cutting-edge hardware in the form of high performance reconfigurable systems which enable the speedup described. Crucially, given the experimental nature of current non-embedded applications which this technology, we also have the expertise and proven results to make this possible. Tests have shown that instance-based learning, in particular when combined with the results from other algorithms, results in superior object classification, but the method remains relatively unexplored in astronomy due to its intractability.
The PI and two Co-Is have joint appointments between the Department of Astronomy and NCSA at the University of Illinois, Urbana-Champaign, and have previously found that technological advances often benefit from the proximity of these institutions. The cyberinfrastructure to produce the catalogs is already in place, in the form of the Data-to-Knowledge Toolkit developed by the ALG. Indeed, it has already been used to classify the 141 million objects in the SDSS Data Release 3, a 70 GB dataset, but to realize the full potential of the available facilities and expertise requires the interdisciplinary collaboration outlined herein. The ISL is keen to gain partners who have real-world applications which may benefit from reconfigurable computing, either in industry or academia. The relevance to NASA and AISR goals is the advancement of computer hardware and software in the field of high performance computing, cyberinfrastructure and machine learning. The use of FPGAs/reconfigurable computing will make tractable other data intensive problems in astronomy such as Fast Fourier Transforms (e.g., for the analysis of Planck mission data), or object classification and statistical analysis within the anticipated petascale datasets of missions such as the LSST. |
|
|