AWS Public Sector Blog

How the University of São Paulo is transforming how researchers access greenhouse gas data for the Amazon rainforest with AWS

How the University of São Paulo is transforming how researchers access greenhouse gas data for the Amazon rainforest with AWS

Climate researchers worldwide face the critical challenge of accessing and analyzing the massive volumes of environmental data needed to understand and prevent irreversible climate change. The Amazon basin plays a vital role in global climate regulation, yet monitoring greenhouse gas emissions across this vast geographic area has historically been complex, expensive, and time-consuming.

Researchers in the University of Sao Paulo Research Center in Greenhouse Gas Innovation (RCGI) greenhouse gas (GHG) program recognized this challenge and saw an opportunity to develop a system that enabled close monitoring of the forest using data systems and data spaces in the cloud. They created Digital Amazon, a distributed data space network with open access that integrates CO2 and greenhouse gas emissions data collected by the university with other data sources to support critical and timely climate action and intervention in the Amazon Forest. By using Amazon Web Services (AWS), the team transformed how GHG data is collected and how researchers access critical environmental data.

Making climate data accessible

The team behind Digital Amazon was inspired by Global Forest Watch’s vision to enable close monitoring of all forests in the world to avoid reaching the point of no return in climate change.

However, their challenge was multifaceted. Environmental datasets were scattered across institutional silos, with satellite imagery, ground tower measurements, and field sensor data stored in separate archives across organizations, including Brazil’s National Institute for Space Research (INPE), MapBiomas Brazil, and the United States Oakridge National Laboratory. Researchers had no way to discover what existed, let alone access it. When a researcher located the relevant data, they might spend up to a week searching for and preparing data for analysis—a significant barrier to rapid climate research.

The team also needed to serve stakeholders with fundamentally different needs for using the same underlying data: atmospheric physicists required raw datasets for hypothesis testing, policymakers needed curated visualizations to drive policy decisions, and collaborating research institutions required governed pathways for cross-border data exchanges. Balancing open access with data integrity, scaling to handle terabytes of environmental data, and maintaining scientific rigor across all these user communities demanded a new approach.

“The complexity of managing environmental data from multiple sources while maintaining accessibility for a global research community required a robust, scalable infrastructure,” explains Professor Jose Reinaldo Silva, director of the GHG program at RCGI. “We needed a solution that could grow with our data needs while reducing the time researchers spent on data preparation.”

Building collaborative architecture in the cloud

Through collaborative design sessions, the AWS team worked closely with the researchers to understand their specific workflows and optimize the architecture for research computing needs. Together, the teams set out to build a research data environment that could:

  • Unify discovery – Single searchable environment spanning all data sources, with metadata classification by region, time period, and monitoring source
  • Enable secure collaboration – Tiered permission model governing how each stakeholder community interacts with, curates, and shares data across institutional boundaries
  • Scale with research demand – Elastic compute and storage that scales, with cloud bursting for computationally intensive workloads
  • Prepare data for AI – Metadata standards and FAIR-aligned data practices creating a foundation for AI-powered search, analysis, and pattern recognition

AWS provided continuous technical guidance throughout the project, with a goal to enable scientific outcomes rather than only providing infrastructure.

Building a scalable data platform with AWS

Digital Amazon integrates 17 AWS services to create a comprehensive solution for environmental data management and access. Currently managing seven terabytes of data with architecture designed to scale to 50 terabytes, the platform demonstrates the power of cloud infrastructure for scientific research.

Amazon Simple Storage Service (Amazon S3) provides scalable storage for massive environmental datasets, offering data durability and accessibility. Amazon Quick Sight enables interactive data visualization so researchers can explore complex atmospheric data through intuitive dashboards.

The platform’s data flow architecture seamlessly ingests information from multiple sources:

  • Satellite data from NASA and European space agencies
  • Ground tower measurements across the Amazon basin
  • Emerging drone data streams with approximately 20-millisecond latency
  • Automated quality control and data curation processes

RCGI implemented a seven-tier permission model using AWS Identity and Access Management (IAM) and AWS Lake Formation, enabling controlled sharing with the global research community while maintaining data quality and scientific integrity. The underlying database remains consistent across all users, and the tiers govern how different stakeholders search and visualize the data, depending on their role and use case. Researchers in atmospheric physics now access raw datasets classified by metadata (region, time period, and monitoring source) for hypothesis testing and trend analysis. Government administrators interact with the same data through curated visualizations tailored to help facilitate easier decision-making. The public can access simplified views for transparency and education. Automated quality control tools and data versioning capabilities maintain the high standards of the platform for scientific accuracy.

The following diagram illustrates the solution architecture.

Diagram of a research data-sharing architecture showing how datasets are transferred from an on-premises data center to AWS for processing and distribution to researchers. Database exports and datasets from the USP data center connect to an AWS DataSync agent, which transfers data over TLS to AWS DataSync in the cloud. DataSync routes files to S3 buckets for metadata and datasets, which trigger Lambda functions (Import data, Update) that populate an Amazon RDS PostgreSQL database. A Step Functions workflow orchestrates Lambda functions (Retrieve, Compact, Share) to process datasets into exported files and send notifications through Amazon SNS. Researchers access exported datasets through Amazon CloudFront, authenticated through Lambda@Edge and Amazon Cognito user pool, with API Gateway routing requests to an Application Lambda that starts the Step Functions workflow. Amazon Route 53 provides DNS resolution, AWS Certificate Manager handles TLS certificates, and a separate sandbox account enables data analysis via Amazon Athena and Amazon QuickSight. Both accounts are managed through AWS Organizations.

Figure 1: Solution architecture

Transforming research efficiency

Digital Amazon has delivered real improvements to researcher productivity and efficiency. Professor Silva reports that tasks that previously took 1 week, such as searching for relevant data and preparing it for analysis, now take as little as 1 hour. This fortyfold improvement in research efficiency means that scientists iterate faster hypotheses, respond more rapidly to environmental events, and dedicate more time to analysis rather than data preparation.

The platform’s open access model facilitates global participation in Amazon climate research. Researchers from institutions worldwide can now access comprehensive environmental data without the traditional barriers of data requests, transfers, and format conversions. Democratizing data access supports cross-institutional collaboration and helps train the next generation of climate scientists.

Moreover, the educational value extends beyond data access. PhD students and postdocs gain hands-on experience with real-world climate data and cloud-enabled research computing, preparing them for careers in environmental science and data-intensive research. What began as a platform for atmospheric physicists now serves a spectrum of stakeholders with a range of technical fluency, each accessing the same authoritative datasets through role-appropriate search and visualization interfaces.

What’s next

The GHG program team continues to innovate with an ambitious drone monitoring initiative that complements satellite and tower-based measurements with automated greenhouse gas data collection. The drones capture depth data that satellites can’t, transmitting it directly to the cloud with target latency of approximately 20 milliseconds and at one-fifth the cost of previous approaches, with per-unit targets dropping from approximately $1 million to $200,000. Beyond greenhouse gas monitoring, the applications include fire detection (reducing detection time from 20 days with satellite monitoring to near real time), agricultural emissions tracking, illegal mining detection, and smarter carbon credit investments.

The economics of drone-based monitoring are driving RCGI to extend coverage beyond the Amazon basin to the Cerrado savanna, Atlantic Forest, offshore systems, and urban environments. This expansion represents a fundamental shift from a centralized data repository to a federated, AI-enabled data space that can demonstrate climate linkages between ecosystems and connect deforestation patterns to urban weather impacts.

As the volume, velocity, and variety of data grows across biomes and collection methods, RCGI is evolving the platform’s architecture to match: investing in visualization with the Amazon Institute of People and the Environment (Imazon) and environmental policy bodies, integrating Amazon Bedrock for AI-powered semantic search, adding graph databases for mapping relationships between emissions sources and ecosystems, and aligning with International Data Space Association standards for federated data governance.

Conclusion

The Digital Amazon project demonstrates how AWS cloud services can transform scientific research by making critical environmental data accessible to researchers worldwide. By reducing data preparation time from 1 week to 1 hour and creating a scalable platform for global collaboration, the University of Sao Paulo has established a model for open science and climate research. As the platform continues to expand with drone monitoring and additional biomes, it will play an increasingly important role in understanding and addressing climate change.

Learn more about how AWS supports open data and data sharing by visiting Open Data on AWS. Explore open datasets available for research and innovation at the Registry of Open Data on AWS.

Learn more:

Maryclaire Abowd

Maryclaire Abowd

Maryclaire is based in Washington, DC, and works with education and research customers as a global business development manager at AWS. Maryclaire has over twelve years of experience working in the IT and services industry supporting government and education customers around the world. Maryclaire is a graduate of Boston College and the London School of Economics.

Antonio Carlos Daud Filho

Antonio Carlos Daud Filho

Antonio Carlos Daud Filho, is an Aeronautical Engineer with a PhD in Mechanical Engineering from USP-São Carlos. He is a postdoctoral researcher in unmanned systems and aircraft applied to the automation of GHG collection at RCGI, supervised by Prof. Emilio Carlos Nelli Silva.

Elinilson Vital

Elinilson Vital

Elinilson Vital holds a degree in applied mathematics and computer science, a master's in mechatronics, and is a doctoral candidate in mechatronics engineering, working with federated cloud service systems and data spaces, under the supervision of Prof. José Reinaldo Silva.

Emilio Carlos Nelli Silva

Emilio Carlos Nelli Silva

Emilio Carlos Nelli Silva holds a PhD in Topological Optimization Systems from the University of Michigan, Ann Arbor. He is the Scientific Vice-Director of RCGI and Director of the Greenhouse Gas Program. He is a professor and head of the Department of Mechatronics Engineering, Polytechnic School, USP.

Glauco Caurin

Glauco Caurin

Glauco Caurin holds a PhD in Engineering and Robotics from ETH Zurich. He is a professor and current head of the Department of Aeronautical Engineering at the São Carlos School of Engineering, USP. He is a researcher at the USP Robotics Center.

José Reinaldo Silva

José Reinaldo Silva

José Reinaldo Silva is a professor in the Department of Mechatronics Engineering, Polytechnic School, USP. He holds a PhD in Automated Systems Design from USP and an MA in Interdisciplinary Computer Science and Artificial Intelligence from Mills College, USA. He is the Coordinator of the Design Lab and a researcher at RCGI.

Paulo Artaxo Netto

Paulo Artaxo Netto

Paulo Artaxo Netto holds a PhD in Atmospheric Physics from USP. He worked at NASA and the Max Planck Institute and recently retired as a full professor in the Department of Atmospheric Physics at the USP Institute of Physics. Prof. Artaxo was the general coordinator of the project in which Digital Amazon was housed.

Rubem Paulo Saldanha

Rubem Paulo Saldanha

Rubem Paulo Saldanha is a Business Development Manager for Research at Amazon Web Services (AWS), based in São Paulo on the Public Sector Brazil team. He leads digital transformation initiatives connecting cloud computing, generative AI, and institutional innovation to national programs, helping researchers accelerate projects and achieve faster results. He holds degrees from Universidade Federal de Mato Grosso and Pontifícia Universidade Católica de São Paulo, with 20 years bridging academia, government, and technology.