Providing equitable access to NASA’s Earth science data archive

NASA’s Earth Science Data and Information System (ESDIS), at Goddard Space Flight Center, manages a vast archive of over 170 petabytes (PB) of Earth science data. Researchers worldwide use this data globally, with more than 450 terabytes (TB) distributed daily to over 8 million users each year.

We have experienced rapidly increasing data volumes, largely due to new, high-data-volume Earth observation missions such as Surface Water and Ocean Topography (SWOT) and NASA/ISRO Synthetic Aperture Radar (NISAR). To effectively ingest, process, and store this growing archive, ESDIS requires a data-management architecture that is cost-effective, flexible, and scalable. To meet these needs, most components of ESDIS are currently operating within the Amazon Web Services (AWS) commercial cloud, with the goal of migrating all of NASA’s Earth science data to the AWS Cloud by the end of 2026.

The shift of data and services to the cloud wasn’t primarily driven by the sheer volume or cost of storing that data. NASA Earth science data has been openly and freely accessible since ESDIS began operations, in accordance with NASA’s full and open data policy. This policy allows all NASA mission data to be available to anyone, anywhere in the world, without restriction.

However, upholding this open data policy requires addressing the challenges posed by modern data volume and processing needs. We needed to address the problem that Ryan Abernathey of EarthMover termed the “data fortress.” The capability to download and store massive amounts of data (terabyte-scale) and apply high performance compute to that data typically requires resources only available to government agencies, universities, and businesses. Because not all our user community belongs to these entities, we recognized the need to provide this capability to all users.

Our challenges were as follows:

How can we establish an environment that provides equitable access to our data?
How will we fund this environment given the near-exponential growth of our archive?
How do we successfully integrate with the new cloud environment and its diverse user base?

Achieving equitable access

A commercial cloud environment provides equitable access by allowing all our users into the data fortress. A commercial cloud environment containing our archive rather than a government run solution also mitigates risk for NASA. Although we do provide open and free data, we don’t provide unlimited “free science.” We can’t provide unlimited free compute to our users, and we aren’t in a position to manage charging them for it.

A commercial cloud environment reduces those risks while providing the scalable compute resources that we need. A cloud environment also means that users can rent rather than own resources. As a result, individual researchers can afford far more compute power than their budgets would allow if they had to own their compute.

Users can spin up the necessary resources adjacent to the data, perform their analysis directly on the data in the cloud, and then spin them down. This eliminates the need for data downloads and the burden of permanent, costly ownership. To summarize, by migrating the Earth data archive to AWS, we can provide equitable access to Earth science data adjacent to high performance computing resources.

Funding this environment with near-exponential data growth

By migrating to a commercial cloud environment, specifically AWS, we can enhance scientific discovery, streamline operations, and realize significant cost efficiencies across our infrastructure.

Data storage

Our latest missions generate petabytes of data, requiring a highly scalable and flexible storage solution. AWS offers on-demand scalability with minimal overhead, and we’ve implemented several AWS services to achieve substantial cost savings in this area:

Optimized rates – Negotiated bulk storage rates reduce overall purchasing costs.
Intelligent tiering – The Amazon Simple Storage Service (Amazon S3) Intelligent-Tiering storage class automatically moves data between storage classes based on usage patterns. This has realized an estimated 60% reduction in our storage costs.
Data protection – Amazon S3 versioning serves as a lazy-copy solution to mitigate the risk of accidental or malicious data deletion, offering resilience without the need for expensive, comprehensive, full-scale backups.

Compute power

ESDIS and our users can bring processing algorithms and software directly to the data in the cloud. This approach eliminates complexity related to hardware support and procurement, thereby accelerating the pace of scientific discovery. Our compute cost efficiencies are achieved through:

Savings plans – Purchasing compute capacity in advance and in bulk through Compute Savings Plans.
Serverless architecture – Using serverless services (such as AWS Lambda, AWS Fargate, and Amazon API Gateway) means that we only pay for the resources actively in use, avoiding the cost of idle capacity.
On-demand processing – Using Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances for on-demand, asynchronous data processing at significantly lower rates than standard pricing.

Infrastructure and operations

The adoption of a common infrastructure based on cloud-native services has reduced tool redundancy, facilitated data and service sharing, and promoted the use of uniform community standards, policies, and processes. Our general infrastructure cost reductions are a result of:

Economies of scale – The amount of resources AWS needs to acquire gives them the power to make cheaper acquisitions in bulk, which means we acquire services for less cost.
Reduced labor – The use of managed services offloads operational tasks, resulting in reduced labor costs.

Integrating with the diverse user base of AWS

How can we effectively introduce ourselves to this new community and publicize the more than 170 petabytes of freely available, open Earth science data? The Registry of Open Data on AWS is the primary channel for sharing data on AWS, which we now use.

On May 4, 2026, NASA advanced its open science goals by making over 6,000 Earth Science collections visible in the Registry of Open Data on AWS. We organized our collections by mission or project into roughly 300 distinct Open Data Registry entries, which you can access using the NASA Space Act Agreement subcatalog in the Registry.

This addition establishes a vital connection for AWS users, who can now seamlessly discover and analyze NASA data. By enhancing public access, this effort promotes collaboration and accelerates the development of advanced Earth science applications and services within AWS offerings.

At the time of this writing, AWS users can access dataset metadata and essential access information through the Registry of Open Data on AWS. To further enrich the experience, future plans include detailing tutorials and notebooks to guide users on using NASA Earth Science data within AWS.

Figure 1: NASA/AWS group photo commemorating the 300 NASA datasets that put the Registry of Open Data over 1,000 datasets. Pictured from left to right: Chris Stoner/AWS, Doug Newman/NASA, David Appel/AWS, Andrew Mitchell/NASA, Kevin Murphy/NASA, and Jamie Baker/AWS.

What NASA ESDIS has done and what is next

NASA Earthdata currently archives over 170 petabytes of data. We have over 90% of our archive in Amazon S3 with a goal to migrate all of it by the end of 2026. AWS users can find the data through the Registry of Open Data and access it within the US West (Oregon) – us-west-2 Regions through Amazon S3, the AWS Command Line Interface (AWS CLI), AWS SDKs such as AWS SDK for Python (Boto3), or from any location using HTTPS. We will continue to bolster our datasets in the Registry of Open Data on AWS, adding new datasets and augmenting existing ones with elements such as tutorials and notebooks to increase their usability to the AWS user community.

Want to learn more?

Learn more about open data on AWS. To learn about using open data on AWS datasets, visit the open data topic in the AWS Public Sector Blog.

AWS Public Sector Blog

Providing equitable access to NASA’s Earth science data archive

Achieving equitable access

Funding this environment with near-exponential data growth

Data storage

Compute power

Infrastructure and operations

Integrating with the diverse user base of AWS

What NASA ESDIS has done and what is next

Want to learn more?

Resources

Follow

Learn

Resources

Developers

Help