Skip all navigation and jump to content Jump to site navigation Jump to section navigation.
NASA - National Aeronautics and Space Administration
+ Visit NASA.gov
AISRP logo
ABOUT AISRP PROGRAM MANAGEMENT PROJECTS RESULTS
Earth Sun System Sun Solar System Universe Exploration Computational Science
Universe
Index
Next
Previous
Started:06/19/2002
Reports
Report:4/4/2006
Report:12/15/2005
Report:12/15/2004
Report:9/15/2004
Report:6/16/2004
Report:6/16/2004
Report:3/11/2004
Report:12/18/2003
Latest Quad:12/19/2006
2005 Workshop
PI: David Bazell
Eureka Scientific

Novel Approaches to Supervised and Unsupervised Data Exploration
n this proposal we examine two novel approaches to the exploration and understanding of astronomical data: the use of unlabeled data for supervised classification and semi-supervised clustering. Because of the large and increasing volume of data from astronomical satellites and ground-based telescopes, researchers can no longer hope to examine the data by hand. Automated techniques are essential lest important discoveries be lost. Current automated classification methods rely on supervised learning algorithms, such as neural networks and decision tree inducers, that require training set containing large amounts of previously classified, or labeled data. While unlabeled data is often cheap and plentiful, using a human to classify the data is tedious, time consuming and expensive. We will develop methods whereby supervised classification techniques can make use of cheaply available, large volumes of unlabeled data to substantially improve their ability to classify objects. If the target classes are unknown, unsupervised clustering is a standard method of exploring unknown data and partitioning it into useful groups. We will also implement and explore several semi-supervised clustering methods. Finally, while classifier learning based on mixed labeled/unlabeled data and semi-supervised clustering can be viewed as separate problems, we will develop and evaluate a unified framework where the learned models directly provide clustering or classification solutions, or both, depending on the needs of the user. This approach allows assignment of astronomical data to predefined classes and will facilitate the discovery of new object classes. To demonstrate the utility of these methods, we will apply them to several test problems of current astronomical interest. Our primary scientific scenario will be aimed at identifying galaxy mergers in a variety of large (unlabeled) catalogs using labeled data from both simulations and observations. We will also examine morphological galaxy classification using data from existing galaxy catalogs and the Hubble Medium Deep Survey. These several data sets contain different types of features used for classification. Both structural (e.g. shape and texture) and photometric features will be tested. The existence of different types of features is directly relevant to the co-training method, one of the methods we will investigate for building classifiers based on labeled and unlabeled data.

FirstGov logo + NASA Privacy, Security, Notices NASA Curator: AISRP Curator
NASA Official: Joseph H. Bredekamp
Last Updated: 01/18/2005