AWS Machine Learning Blog

Build an agronomic data platform with Amazon SageMaker geospatial capabilities

The world is at increasing risk of global food shortage as a consequence of geopolitical conflict, supply chain disruptions, and climate change. Simultaneously, there’s an increase in overall demand from population growth and shifting diets that focus on nutrient- and protein-rich food. To meet the excess demand, farmers need to maximize crop yield and effectively manage operations at scale, using precision farming technology to stay ahead.

Historically, farmers have relied on inherited knowledge, trial and error, and non-prescriptive agronomic advice to make decisions. Key decisions include what crops to plant, how much fertilizer to apply, how to control pests, and when to harvest. However, with an increasing demand for food and the need to maximize harvest yield, farmers need more information in addition to inherited knowledge. Innovative technologies like remote sensing, IoT, and robotics have the potential to help farmers move past legacy decision-making. Data-driven decisions fueled by near-real-time insights can enable farmers to close the gap on increased food demand.

Although farmers have traditionally collected data manually from their operations by recording equipment and yield data or taking notes of field observations, builders of agronomic data platforms on AWS help farmers work with their trusted agronomic advisors use that data at scale. Small fields and operations more easily allow a farmer to see the entire field to look for issues affecting the crop. However, scouting each field on a frequent basis for large fields and farms is not feasible, and successful risk mitigation requires an integrated agronomic data platform that can bring insights at scale. These platforms help farmers make sense of their data by integrating information from multiple sources for use in visualization and analytics applications. Geospatial data, including satellite imagery, soil data, weather, and topography data, are layered together with data collected by agricultural equipment during planting, nutrient application, and harvest operations. By unlocking insights through enhanced geospatial data analytics, advanced data visualizations, and automation of workflows via AWS technology, farmers can identify specific areas of their fields and crops that are experiencing an issue and take action to protect their crops and operations. These timely insights help farmers better work with their trusted agronomists to produce more, reduce their environmental footprint, improve their profitability, and keep their land productive for generations to come.

In this post, we look at how you can use the predictions generated from Amazon SageMaker geospatial capabilities into a user interface of an agronomic data platform. Furthermore, we discuss how software development teams are adding advanced machine learning (ML)-driven insights, including remote sensing algorithms, cloud masking (automatically detecting clouds within satellite imagery) and automated image processing pipelines, to their agronomic data platforms. Together, these additions help agronomists, software developers, ML engineers, data scientists, and remote sensing teams provide scalable, valuable decision-making support systems to farmers. This post also provides an example end-to-end notebook and GitHub repository that demonstrates SageMaker geospatial capabilities, including ML-based farm field segmentation and pre-trained geospatial models for agriculture.

Adding geospatial insights and predictions into agronomic data platforms

Established mathematical and agronomic models combined with satellite imagery enable visualization of the health and status of a crop by satellite image, pixel by pixel, over time. However, these established models require access to satellite imagery that is not obstructed by clouds or other atmospheric interference that reduces the quality of the image. Without identifying and removing clouds from each processed image, predictions and insights will have significant inaccuracies and agronomic data platforms will lose the trust of the farmer. Because agronomic data platform providers commonly serve customers comprising thousands of farm fields across varying geographies, agronomic data platforms require computer vision and an automated system to analyze, identify, and filter out clouds or other atmospheric issues within each satellite image before further processing or providing analytics to customers.

Developing, testing, and improving ML computer vision models that detect clouds and atmospheric issues in satellite imagery presents challenges for builders of agronomic data platforms. First, building data pipelines to ingest satellite imagery requires time, software development resources, and IT infrastructure. Each satellite imagery provider can differ greatly from each other. Satellites frequently collect imagery at different spatial resolutions; resolutions can range from many meters per pixel to very high-resolution imagery measured in centimeters per pixel. Additionally, each satellite may collect imagery with different multi-spectral bands. Some bands have been thoroughly tested and show strong correlation with plant development and health indicators, and other bands can be irrelevant for agriculture. Satellite constellations revisit the same spot on earth at different rates. Small constellations may revisit a field every week or more, and larger constellations may revisit the same area multiple times per day. These differences in satellite images and frequencies also lead to differences in API capabilities and features. Combined, these differences mean agronomic data platforms may need to maintain multiple data pipelines with complex ingestion methodologies.

Second, after the imagery is ingested and made available to remote sensing teams, data scientists, and agronomists, these teams must engage in a time-consuming process of accessing, processing, and labeling each region within each image as cloudy. With thousands of fields spread across varying geographies, and multiple satellite images per field, the labeling process can take a significant amount of time and must be continually trained to account for business expansion, new customer fields, or new sources of imagery.

Integrated access to Sentinel satellite imagery and data for ML

By using SageMaker geospatial capabilities for remote sensing ML model development, and by consuming satellite imagery from the AWS Data Exchange conveniently available public Amazon Simple Storage Service (Amazon S3) bucket, builders of agronomic data platforms on AWS can achieve their goals faster and more easily. Your S3 bucket always has the most up-to-date satellite imagery from Sentinel-1 and Sentinel-2 because Open Data Exchange and the Amazon Sustainability Data Initiative provide you with automated built-in access to satellite imagery.

The following diagram illustrates this architecture.

The following diagram illustrates this architecture

SageMaker geospatial capabilities include built-in pre-trained deep neural network models such as land use classification and cloud masking, with an integrated catalog of geospatial data sources including satellite imagery, maps, and location data from AWS and third parties. With an integrated geospatial data catalog, SageMaker geospatial customers have easier access to satellite imagery and other geospatial datasets that remove the burden of developing complex data ingestion pipelines. This integrated data catalog can accelerate your own model building and the processing and enrichment of large-scale geospatial datasets with purpose-built operations such as time statistics, resampling, mosaicing, and reverse geocoding. The ability to easily ingest imagery from Amazon S3 and use SageMaker geospatial pre-trained ML models that automatically identify clouds and score each Sentinel-2 satellite image removes the need to engage remote sensing, agronomy, and data science teams to ingest, process, and manually label thousands of satellite images with cloudy regions.

SageMaker geospatial capabilities support the ability to define an area of interest (AOI) and a time of interest (TOI), search within the Open Data Exchange S3 bucket archive for images with a geospatial intersect that meets the request, and return true color images, Normalized Difference Vegetation Index (NDVI), cloud detection and scores, and land cover. NDVI is a common index used with satellite imagery to understand the health of crops by visualizing measurements of the amount of chlorophyll and photosynthetic activity via a newly processed and color-coded image.

Users of SageMaker geospatial capabilities can use the pre-built NDVI index or develop their own. SageMaker geospatial capabilities make it easier for data scientists and ML engineers to build, train, and deploy ML models faster and at scale using geospatial data and with less effort than before.

Farmers and agronomists need fast access to insights in the field and at home

Promptly delivering processed imagery and insights to farmers and stakeholders is important for agribusinesses and decision-making at the field. Identifying areas of poor crop health across each field during critical windows of time allows the farmer to mitigate risks by applying fertilizers, herbicides, and pesticides where needed, and even identify areas of potential crop insurance claims. It is common for agronomic data platforms to comprise a suite of applications, including web applications and mobile applications. These applications provide intuitive user interfaces that help farmers and their trusted stakeholders securely review each of their fields and images while at home, in the office, or standing in the field itself. These web and mobile applications, however, need to consume and quickly display processed imagery and agronomic insights via APIs.

Amazon API Gateway makes it easy for developers to create, publish, maintain, monitor, and secure RESTful and WebSocket APIs at scale. With API Gateway, API access and authorization is integrated with AWS Identity Access Management (IAM), and offers native OIDC and OAuth2 support, as well as Amazon Cognito. Amazon Cognito is a cost-effective customer identity and access management (CIAM) service supporting a secure identity store with federation options that can scale to millions of users.

Raw, unprocessed satellite imagery can be very large, in some instances hundreds of megabytes or even gigabytes per image. Because many agricultural areas of the world have poor or no cellular connectivity, it’s important to process and serve imagery and insights in smaller formats and in ways that limit required bandwidth. Therefore, by using AWS Lambda to deploy a tile server, smaller sized GeoTIFFs, JPEGs, or other imagery formats can be returned based on the current map view being displayed to a user, as opposed to much larger file sizes and types that decrease performance. By combining a tile server deployed through Lambda functions with API Gateway to manage requests for web and mobile applications, farmers and their trusted stakeholders can consume imagery and geospatial data from one or hundreds of fields at once, with reduced latency, and achieve an optimal user experience.

SageMaker geospatial capabilities can be accessed via an intuitive user interface that enables you to gain easy access to a rich catalog of geospatial data, transform and enrich data, train or use purpose-build models, deploy models for predictions, and visualize and explore data on integrated maps and satellite images. To read more about the SageMaker geospatial user experience, refer to How Xarvio accelerated pipelines of spatial data for digital farming with Amazon SageMaker geospatial capabilities.

Agronomic data platforms provide several layers of data and insights at scale

The following example user interface demonstrates how a builder of agronomic data platforms may integrate insights delivered by SageMaker geospatial capabilities.

SageMaker geospatial capabilities

This example user interface depicts common geospatial data overlays consumed by farmers and agricultural stakeholders. Here, the consumer has selected three separate data overlays. First, the underlying Sentinel-2 natural color satellite image taken from October, 2020, and made available via the integrated SageMaker geospatial data catalog. This image was filtered using the SageMaker geospatial pre-trained model that identifies cloud cover. The second data overlay is a set of field boundaries, depicted with a white outline. A field boundary is commonly a polygon of latitude and longitude coordinates that reflects the natural topography of a farm field, or operational boundary differentiating between crop plans. The third data overlay is processed imagery data in the form of Normalized Difference Vegetation Index (NDVI). Further, the NDVI imagery is overlaid on the respective field boundary, and an NDVI color classification chart is depicted on the left side of the page.

The following image depicts the results using a SageMaker pre-trained model that identifies cloud cover.

SageMaker pre-trained model that identifies cloud cover

In this image, the model identifies clouds within the satellite image and applies a yellow mask over each cloud within the image. By removing masked pixels (clouds) from further image processing, downstream analytics and products have improved accuracy and provide value to farmers and their trusted advisors.

In areas of poor cellular coverage, reducing latency improves the user experience

To address the need for low latency when evaluating geospatial data and remote sensing imagery, you can use Amazon ElastiCache to cache processed images retrieved from tile requests made via Lambda. By storing the requested imagery into a cache memory, latency is further reduced and there is no need to re-process imagery requests. This can improve application performance and reduce pressure on databases. Because Amazon ElastiCache supports many configuration options for caching strategies, cross-region replication, and auto scaling, agronomic data platform providers can scale up quickly based upon application needs, and continue to achieve cost efficiency by paying for only what is needed.


This post focused on geospatial data processing, implementing ML-enabled remote sensing insights, and ways to streamline and simplify the development and enhancement of agronomic data platforms on AWS. It illustrated several methods and services that builders of agronomic data platforms on AWS services can use to achieve their goals, including SageMaker, Lambda, Amazon S3, Open Data Exchange, and ElastiCache.

To follow an end-to-end example notebook that demonstrates SageMaker geospatial capabilities, access the example notebook available in the following GitHub repository. You can review how to identify agricultural fields through ML segmentation models, or explore the preexisting SageMaker geospatial models and the bring your own model (BYOM) functionality on geospatial tasks such as land use and land cover classification. The end-to-end example notebook is discussed in detail in the companion post How Xarvio accelerated pipelines of spatial data for digital farming with Amazon SageMaker Geospatial.

Please contact us to learn more about how the agricultural industry is solving important problems related to global food supply, traceability, and sustainability initiatives by using the AWS Cloud.

About the authors

Will Conrad is the Head of Solutions for the Agriculture Industry at AWS. He is passionate about helping customers use technology to improve the livelihoods of farmers, the environmental impact of agriculture, and the consumer experience for people who eat food. In his spare time, he fixes things, plays golf, and takes orders from his four children.

Bishesh Adhikari is a Machine Learning Prototyping Architect at the AWS Prototyping team. He works with AWS customers to build solutions on various AI & Machine Learning use-cases to accelerate their journey to production. In his free time, he enjoys hiking, travelling, and spending time with family and friends.

Priyanka Mahankali is a Guidance Solutions Architect at AWS for more than 5 years building cross-industry solutions including technology for global agriculture customers. She is passionate about bringing cutting-edge use cases to the forefront and helping customers build strategic solutions on AWS.

Ron Osborne is AWS Global Technology Lead for Agriculture – WWSO and a Senior Solution Architect. Ron is focused on helping AWS agribusiness customers and partners develop and deploy secure, scalable, resilient, elastic, and cost-effective solutions. Ron is a cosmology enthusiast, an established innovator within ag-tech, and is passionate about positioning customers and partners for business transformation and sustainable success.