AWS Public Sector Blog

Building hybrid satellite imagery processing pipelines in AWS

Building hybrid satellite imagery processing pipelines in AWS

From their unique vantage point, satellites acquire data that help us better understand our universe. Earth observation (EO) satellites orbiting our planet are constantly capturing imagery to monitor and understand Earth’s environment. This can support decision-making on a wide range of topics including climate change, disaster management, agriculture and food security, and infrastructure development.

But to extract this actionable data and insights, raw satellite data needs to undergo several processing steps to be transformed into higher level products. These processing pipelines often use machine learning (ML) algorithms for some of those steps.

Aerospace and geospatial companies use the Amazon Web Services (AWS) Cloud to develop and deploy these processing workloads in a secure, scalable, and cost-optimized way. They can use Amazon Simple Storage Service (Amazon S3) to store petabytes of data durably and cost-efficiently, and choose their preferred compute solution to define an architecture capable of scaling dynamically based on demand. Some customers rely on AWS serverless services portfolio, delegating infrastructure management and reducing operational overhead. Orbital Sidekick uses AWS to process satellite data with this type of architecture.

However, in some cases, companies encounter particular use cases or end-customers which require their processing pipeline to be deployed on-premises. The rationale behind these requirements is diverse; some organizations have data residency needs, or intend to maximize the return on investment for existing infrastructure. Even if these scenarios represent a small fraction of the total use cases, companies developing these types of workloads usually aim to design architectures that can support these needs and avoid maintaining two parallel solutions: one suitable for cloud deployments and one for on-premises use cases.

In this blog post, learn how companies operating in AWS can design highly-flexible architectures that can support both cloud and on-premises deployment use cases for their satellite imagery processing workloads with minimal modifications, using AWS services like Amazon Elastic Kubernetes Service (Amazon EKS) and AWS Outposts.

What does on-premises really mean?

When discussing deployments that must be performed on-premises, the specific requirements behind this need typically determine the most beneficial solution. Some may think that running on-premises necessarily implies running on customer infrastructure. However, this may not be the case. For instance, if this requirement is due to latency or data residency constraints, then organizations and companies can consider running on AWS Outposts as an alternative solution.

Outposts is a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location. With Outposts, you can run some AWS services locally and connect to a broad range of services available in the AWS Region, achieving operational consistency and using familiar AWS services, tools, and APIs, which help maintain the same pace of innovation as in the cloud. It significantly reduces operational overhead compared to running on customer infrastructure as under the AWS Shared Responsibility Model, AWS is responsible for the hardware and software that run AWS services. AWS manages security patches, updates firmware, and maintains the Outposts equipment. AWS also monitors the performance, health, and metrics for your Outposts and determines whether any maintenance is required.

Consistent compute environments across disparate deployments

When organizations need to prioritize the ability to deploy data processing workflows across multiple environments (including cloud and on-premises), they can use Amazon EKS. As a container-based solution, Amazon EKS offers portability and operational consistency across different deployment environments. Plus, Amazon EKS is certified Kubernetes-conformant, so existing applications that run on upstream Kubernetes are compatible with Amazon EKS. This offers further advantages, such as increased interoperability, flexibility, and a growing pool of Kubernetes literate IT professionals.

Companies can use Kubernetes to run data processing jobs and orchestrate ML pipelines in cloud-native or on-premises environments as it supports job schedulers such as Airflow, Prefect, or Argo to manage complex workflows; frameworks like Spark for batch processing; and multiple ML platforms like Kubeflow or MLFlow. Last year, AWS introduced the AWS Data on EKS (DoEKS) initiative to create and distribute resources, such as best practices, infrastructure as code (IaC) templates, and sample code, to simplify and speed up the process of building, deploying, and scaling data workloads on Amazon EKS.

Amazon EKS provides a spectrum of Kubernetes deployment options ranging from AWS-managed to customer-managed infrastructure, so that you can adapt to many use cases.

 Figure 1. Amazon EKS spectrum of deployment options from AWS-managed to customer-managed. Beginning at the AWS-managed end and moving toward more customer-managed options, customers can use Amazon EKS, Amazon EKS in Local Zones, Amazon EKS in Wavelength Zones, Amazon EKS on Outposts, Amazon EKS Anywhere, and Amazon EKS Distro.

Figure 1. Amazon EKS spectrum of deployment options from AWS-managed to customer-managed. Beginning at the AWS-managed end and moving toward more customer-managed options, customers can use Amazon EKS, Amazon EKS in Local Zones, Amazon EKS in Wavelength Zones, Amazon EKS on Outposts, Amazon EKS Anywhere, and Amazon EKS Distro.

Deploying in an AWS Region

When running workloads in an AWS Region, Amazon EKS helps run Kubernetes clusters at scale. Amazon EKS minimizes the operational effort required by providing a fully-managed, highly-available, and scalable Kubernetes control plane running across multiple AWS Availability Zones. You can then choose to use either Amazon Elastic Compute Cloud (Amazon EC2) instances or AWS Fargate for the data plane.

Deploying on AWS Outposts rack

When deploying a pipeline on-premises using an Outposts rack, you can use Amazon EKS on Outposts and keep using the same application programming interfaces (APIs), console, and tools you use to run Amazon EKS clusters in the cloud. With the extended clusters deployment option, you can continue running the Kubernetes control plane in an AWS Region and the worker nodes in the Outposts rack. However, if there is poor or intermittent connectivity to the AWS Region running Amazon EKS, AWS recommends using the local clusters deployment option, in which both the control plane and nodes run in the Outposts rack.

Deploying on customer infrastructure

For on-premises use cases where Outposts rack is not a viable option, you can still use Amazon EKS Anywhere to create and operate Kubernetes clusters on-premises on customer infrastructure. Amazon EKS Anywhere uses Amazon EKS Distro, the same Kubernetes distribution deployed by Amazon EKS, allowing you to create clusters consistent with Amazon EKS best practices like the latest software updates and extended security patches.

How to implement machine learning operations (MLOps) in hybrid environments

Typically, satellite imagery processing pipelines include steps that perform ML inference, such as cloud detection and land cover classification. In these cases, it is important to build robust MLOps and maintain traceability for the models deployed across multiple environments.

In hybrid scenarios that require ML inference to be performed on-premises, customers can choose two main options:

  1. Deploy the complete MLOps pipeline as part of the on-premises workload, including building, training, deploying, and managing the ML models. Customers can deploy their preferred ML platform, such as Kubeflow, Metaflow or MLflow, on the provisioned Amazon EKS cluster, either on Outposts or on customer infrastructure. These frameworks are open-source and offer flexibility and portability.
  1. Build, train, and manage ML models in the AWS Region and deploy the models to run inference on-premises. In this case, you can still run your preferred open-source ML platform on an Amazon EKS cluster in the AWS Region; however, as an alternative, you can use Amazon SageMaker. SageMaker is an AWS service to prepare data and build, train, and deploy ML models with fully managed infrastructure, tools, and workflows. Models built and trained with SageMaker in the AWS Region can then be deployed on-premises.

How does everything fit together?

You can integrate the previously discussed on-premises deployments with the rest of your infrastructure in AWS. The following Figure 2 and Figure 3 show two reference architectures including an on-premises deployment on Outposts rack and an on-premises deployment on customer infrastructure, respectively. Details of the implementation will vary depending on the particular use case and associated requirements.

Satellite imagery processing pipeline deployment on Outposts rack

Figure 2. Architectural diagram for a satellite imagery processing pipeline deployed on Outposts rack.

Figure 2. Architectural diagram for a satellite imagery processing pipeline deployed on Outposts rack.

Figure 2 features a high-level architecture for creating a satellite imagery processing pipeline deployed on Outposts rack. Use the following steps to build the architecture:

  1. Create a continuous integration (CI) pipeline for your imagery processing workloads using AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild. Store the container images in Amazon Elastic Container Registry (Amazon ECR).
  2. Develop and train your ML models either using SageMaker in the AWS Region, or an alternative ML solution, either in the AWS Region or as part of the on-premises deployment.
  3. Use Amazon CloudWatch to centrally monitor AWS and on-premises resources.
  4. Achieve a consistent hybrid experience and fully managed infrastructure using Outposts rack for the on-premises deployment.
  5. Host your processing pipeline in Amazon EKS on Outposts. Choose your preferred orchestration tool.
  6. Use a continuous delivery (CD) tool like FluxCD, an open source CD system developed by Weaveworks, to retrieve and deploy the latest container images.
  7. Run batch operations to optimize processing time using solutions such as Amazon EMR on EKS.
  8. Use the ML framework chosen during model development for the processing pipeline steps that require ML inference.
  9. Store your raw and processed satellite imagery data in Amazon S3 on Outposts. Maintain metadata in Amazon Relational Database Service (Amazon RDS).
  10. A service link connects the Outpost rack with your chosen AWS Region. Optionally, you can use AWS Direct Connect.

Satellite imagery processing pipeline deployment on customer infrastructure

Figure 3. Architectural diagram for a satellite imagery processing pipeline deployed on customer infrastructure.

Figure 3. Architectural diagram for a satellite imagery processing pipeline deployed on customer infrastructure.

Figure 3 features a high-level architecture for creating a satellite imagery processing pipeline deployed on the customer infrastructure. Use the following steps to build the architecture:

  1. Create a continuous integration (CI) pipeline for your imagery processing workloads using CodeCommit, CodePipeline, and CodeBuild. Store the container images in Amazon ECR.
  2. Develop and train your ML models either using SageMaker in the AWS Region, or an alternative ML solution either in the AWS Region or as part of the on-premises deployment.
  3. Use CloudWatch to centrally monitor AWS and on-premises resources.
  4. For cases where requirements do not allow for an Outposts rack deployment, customers can deploy this hybrid architecture directly on customer infrastructure.
  5. Host the processing pipeline in Amazon EKS Anywhere. Choose your preferred orchestration tool.
  6. Use a continuous delivery (CD) tool like FluxCD to retrieve and deploy the latest container images.
  7. Run batch operations to optimize processing time using your preferred solution.
  8. Use the ML framework chosen during model development for the processing pipeline steps that require ML inference.
  9. Store your raw and processed satellite imagery data in your chosen object storage solution. Maintain metadata in a PostgreSQL database.
  10. Connect your AWS Region deployment with your corporate data center using AWS Site-to-Site VPN or Direct Connect.

Learn more about AWS for aerospace and satellite

Aerospace organizations can use AWS to design architectures that maximize flexibility for their satellite imagery processing workloads and allow for both cloud and on-premises deployment use cases with minimal modifications. Find more curated solutions for other common use cases for the aerospace and satellite industry in the AWS Solutions Library.

Organizations of all sizes across all industries are transforming and delivering on their aerospace and satellite missions every day using AWS. Learn more about the cloud for aerospace and satellite solutions so you can start your own AWS Cloud journey today.

Get inspired. Watch our AWS in Space story.