AWS Public Sector Blog

Downscaled CMIP5, 1950 US Census, and open genomics data for Galaxy: The latest open data on AWS

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). We work with data providers to: democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets.

Our full list of publicly available datasets are on the Registry of Open Data on AWS. This quarter, we released 13 new or updated datasets including CMIP5, 1950s US Decennial Census, and open genomics data for Galaxy. Read on for some highlights among the new datasets:

CMIP5 UWPD Dataset
The National Oceanic and Atmospheric Administration (NOAA) released the Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling (UWPD) Dataset. As shown in the CMIP5 UWPD documentation, statistically downscaling this dataset increases ease of usage by “weeding out” lesser probable forecasts based on CMIP5 climate models, allowing users to visualize weather events like local and regional storm fronts more easily. UWPD adds daily precipitation, as well as maximum and minimum temperature to the dataset. Learn more about the CMIP5 UWPD dataset.

1950 US Decennial Census
On April 1, 2022, the US National Archives and Records Administration (NARA) made the complete 1950 Census available to the public via AWS. Kept confidential for 72 years, the 1950 Census contains information about individuals living in the United States during the pivotal post-WWII time. Details for accessing the full dataset can be found on the 1950 Census Registry of Open Data page. Read more about the 1950 Census on the AWS Public Sector Blog and at the National Archives website.

Open bioinformatics reference data for Galaxy from Galaxy and Bioconductor Projects
Galaxy is an open-source platform that enables users to apply diverse bioinformatics tools through a user-friendly graphical web interface. To use many diverse tools in concert, Galaxy provides the references and indexes required for these tools seamlessly to their users. With the onboarding of these valuable references to the Registry of Open Data on AWS, these data can now be readily consumed by any Galaxy server with high availability and scaleability. In addition, in collaboration with Bioconductor Projects, the Galaxy resource in the Registry of Open Data also contains data experiment packages that includes sample datasets and experimental outcomes for analyses as diverse as single cell genomics to RNA sequencing. Learn more about the open bioinformatics reference data for Galaxy dataset.

Find these and other recently released datasets in the latest What’s New.

We’re excited to see how you can put these great datasets to work. If you have examples of tutorials, applications, tools, or publications that use these datasets, make sure to list them on the Registry of Open Data on AWS so the community can find them. Learn how to propose your dataset to the AWS Open Data Sponsorship Program and learn more about open data on AWS.

Read related stories about AWS and open data:

Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

Joe Flasher

Joe Flasher

Joe Flasher is the open data lead at Amazon Web Services (AWS), helping organizations most effectively make data available for analysis in the cloud. The AWS Open Data program has democratized access to petabytes of data, including satellite imagery, climate & weather data, genomic data, and data used for natural language processing. He has been working with geospatial data and open source projects for the past decade, both as a contributor and maintainer. He has been a member of the Landsat Advisory Group and has worked on projects ranging from building GIS software to making the space shuttle fly. His background is in astrophysics, but kindly requests you don’t ask him any questions about constellations.