New Public Data Set: Sloan Digital Sky Survey DR6 Subset

The Sloan Digital Sky Survey, or SDSS, is now available as a Public Data Set.

Weighing in at 180 GB, the SDSS is the most ambitious astronomical survey ever undertaken. The researchers have used a 2.5 meter, 120 megapixel telescope located in Apache Point, New Mexico to capture images of over one quarter of the sky, or about 230 million celestial objects. They have also created 3-dimensional maps containing more than 930,000 galaxies and 120,000 quasars.

This new public data set (which is a subset of the entire SDSS) will be of interest to students, educators, hobby astronomers, and researchers. From a standing start, it is possible to launch an EC2 instance, create an Elastic Block Store volume with this data, attach the volume to the instance and start examining and processing the data in less than ten minutes.

The data set takes the form of a Microsoft SQL Server MDF file. Once you have created your EBS volume and attached it to your Windows EC2 instance, you can access the data using SQL Server Enterprise Manager or SQL Server Management Studio. The SDSS makes use of stored procedures, user defined functions, and a spatial indexing library, so porting it to another database would be a fairly complex undertaking.

I know from experience (my son Andy is studying Astronomy at the University of Washington and is always showing me the “please delete your unnecessary files” emails from the department’s administrator) that storage space is always at a premium in academic settings, due in part to the existence of large scale data sets like this. The combination of EC2, EBS, this public data set, and our AWS in Education program should enable students and educators to analyze, process, display, and study the universe in revolutionary ways.

