Announcing the NOAA Big Data Project
I am happy to be able to announce that we have entered into a research agreement with the US National Oceanic and Atmospheric Administration (NOAA) to explore sustainable models for increasing the amount of open NOAA data that is made available via the cloud.
The AWS Public Data Sets program hosts large collections of public data that anyone can access for free. We started this program in order to foster the development of communities and tools around data sets, with the expectation that this would create new businesses, accelerate research, and improve lives.
Most recently, we have worked with USGS, NASA, NIH, and other organizations to make popular data such as Landsat, 1000 Genomes Project, NASA NEX, and many others available for analysis and processing within AWS using Amazon Elastic Compute Cloud (EC2), Amazon EMR, AWS Elastic Beanstalk, and other AWS compute and analysis services.
In the course of running the Public Data Sets program, we have discovered a set ingredients that, when combined, can make these collaborations a success. The ingredients include selecting data based on demand from users, making sure that the data is of high quality and well-documented, and providing tools and training to promote the use of the data. We have found that it is also important to provide some context around the data in order to show how it can be used to solve real-world problems.
New Research Agreement
Under the terms of the new research agreement, AWS and collaborators including The Weather Company, Esri, Planet OS, and others will look for ways to push more of NOAA’s data to the cloud, with a focus on spurring innovation and building a healthy and vibrant ecosystem around the data.
The data NOAA already makes available to the public drives critical research efforts and multi-billion dollar industries. We anticipate that making more of NOAA’s data widely available will drive even more economic value and social good. This data can be used to build applications that protect life, health, and property by keeping our oceans healthy, our coastline communities safe and resilient, and more (see the Societal Impacts of NOAA page for more information).
We’ll be using what we have already learned from our experience with other Public Data Sets as we continue our research into the best way to make this data available and accessible to as many users as possible. As I mentioned earlier, providing data in response to specific requests from users is key and we will make sure to provide you with ways to express your needs and your interests.
The response to our efforts to make interesting and valuable data available has been great. For example, we recently announced our intent to host up to one petabyte of Landsat data on S3 and to make it freely available for anyone to use. Although this project is still getting underway, we have already seen exciting new applications from Esri, MATLAB, Development Seed, Mapbox, and Planet Labs. Less than 48 hours after the initial release, the data was used at a mining industry hackathon in Australia. There, a team developed a proof of concept application that used machine learning to identify possible deposits of titanium. Also, a team of novice programming students from Code Fellows were able to create a sophisticated tool to analyze Landsat imagery just weeks after learning how to code! The tool is called Snapsat; it allows you to create Landsat composites from your browser:
Over 125,000 Landsat scenes are already available on S3 and we’re adding hundreds more every day. Visit our Landsat on AWS page to learn how to work with this data.
If you are interested in learning more about the NOAA Big Data Project and are interested in engaging with our new “Data Alliance” as a data user or as a value-added service provider, sign up here and we’ll keep you up to date!