Stretching your on-premises environment to AWS using Amazon ECS Anywhere
Amazon Elastic Container Service (Amazon ECS) allows customers to run container workloads in AWS on AWS-managed infrastructure as well as on customer-managed infrastructure using Amazon ECS Anywhere. Whether on premises or in the cloud, customers have a consistent cluster management, workload scheduling, and monitoring experience with Amazon ECS. Amazon ECS Anywhere lets you have a consistent Amazon ECS developer experience across your on-premises data center and AWS.
Customers have several reasons to stretch their on-premises environment to AWS or the other way around:
- Maintaining data sovereignty
- Running services nearer the customer, for example low-latency workloads at edge locations
- Running analytics or business intelligence processes closer to the data source or filtering data before sending it to the cloud for further analysis
- Using it as a path of migration to AWS
- Bursting into AWS for compute resources during large events
- Making use of existing capital investments
For customers looking to understand how to design workloads that are stretched across AWS and on premises, this blog post presents a sample architecture. The post also shares complementary AWS services that you can consider to improve operational and developer efficiency.
To help position the sample architecture in this post, we refer to an example workload that has the frontend components and APIs deployed on AWS and an order processing component running on premises due to data processing requirements. A hybrid workload like this allows the customer to benefit from the availability, cost optimization, and resilience that AWS offers while meeting the requirement to process orders on premises.
The following diagram shows the sample architecture.
By deploying the sample workload to Amazon ECS, we create an Amazon ECS cluster and three Amazon ECS services. On the cloud, two Amazon ECS services support the web UI and an orders API, which publishes orders to an Amazon Simple Queue Service (Amazon SQS) queue. In the customer data center, an Amazon ECS service runs the order processing service, which polls the Amazon SQS queue for work.
The Amazon SQS queue acts as a boundary, decoupling the network connectivity between AWS and the on-premises environment in case of network partitions. The Amazon Elastic Container Registry (Amazon ECR) is used to store and scan container images to help identify software vulnerabilities. By having the public components on AWS, the frontend components can be exposed by an AWS Application Load Balancer with Amazon CloudFront to provide a low-latency experience for customers.
To complement the sample architecture, customers look for tools to support their continuous integration and continuous delivery (CI/CD) approach to software release management. Because there is a single Amazon ECS control plane shared between the on-premises environment and AWS, this enables deployments to be easily standardized from either AWS CodePipeline or a CI/CD tool (for example, Jenkins, GitLab, GitHub Actions, and so on). By standardizing deployments across your AWS and on-premises environment, it reduces the additional overheads (such as the testing and maintenance of deployment scripts) for on-premises and AWS based deployments. For the sample architecture in this blog, I will be using CodePipeline, but the concepts can be easily transferred to other CI/CD tools. To support DevOps practices, it is recommended that each service has its own application pipeline, enabling application components to be built, tested, and deployed independently.
The following diagram shows the example components for a pipeline with CodePipeline:
For detailed steps on how to create a pipeline with CodePipeline, read this user guide: Tutorial: Amazon ECS Standard Deployment with CodePipeline.
As part of the solution, there are some additional design considerations.
Integrate with other AWS services: An external ECS launch type, like other Amazon ECS launch types, supports AWS Identity and Access Management (IAM) roles for tasks. This enables your application to integrate with AWS services using temporary AWS access and secret keys automatically acquired from AWS Security Token Service (AWS STS), removing IAM access key management from your responsibility. You are only responsible for defining least privileged IAM roles for your service.
Infrastructure as code/pipelines as code: You can use the AWS Cloud Development Kit (AWS CDK), CloudFormation, or other third-party tools to define your application pipeline and infrastructure as code. You are responsible for bootstrapping your external machines to connect to the Amazon ECS control plane.
Scaling: Amazon ECS, Amazon Elastic Compute Cloud (Amazon EC2), and AWS Fargate can scale in the cloud, helping you meet customer demand and benefit from the elasticity AWS offers. Amazon ECS with an external launch type can scale on premises, but the on-premises limitation would be the available compute. In this post’s example, you may choose to scale the Amazon ECS order processing service based on the
ApproximateNumberOfMessages in the Amazon SQS queue. If scaling exceeds the available compute for the on-premises environment, there could be a delay in processing all the data. If there is a longer-term compute constraint, you will need to look at adding more compute or moving elsewhere. You should set a sensible maximum task scaling count to ensure your workload is not impacted by compute constraints. By stretching into AWS, you can benefit from enabling a global reach for your application.
Application secrets and configuration: Amazon ECS service types (such as Fargate, Amazon EC2, and external) can use AWS Systems Manager Parameter Store and AWS Secrets Manager for injecting environment specific secrets or configuring your application.
Application monitoring and logging: ECS services with an external launch type can also publish application logs to AWS CloudWatch logs using the
awslogs log driver, providing a consistent experience and location for logs across your AWS and on-premises environment. In addition, your application can integrate with AWS X-Ray for tracing requests through your application microservices from AWS to on premises, also providing a single viewpoint for analyzing and debugging application issues. See Getting started with AWS X-Ray for more details. It is recommended to have a highly available network connection when using these AWS services, because, for example, the AWS X-Ray daemon uses the UDP protocol, and traces will be dropped on network interruptions. You need to design and test your architecture, if reliant on AWS Cloud services, to withstand network interruption.
Fleet management: It can be difficult to keep an inventory of resources across AWS and on-premises environments. By registering your bare metal/virtual machines (on premises or on Amazon EC2) with Systems Manager, you will have a single inventory of machines across AWS and the on-premises environment. Using this inventory, you can leverage AWS Systems Manager Patch Manager to automate host tasks, such as patching. In addition, you can use AWS Systems Manager Session Manager for centralized remote access, without the need to manage bastion hosts or SSH keys. You are responsible for testing patches and any impacts on your workload. See Setting up AWS Systems Manager for hybrid environments for more details.
A sample stretched architecture that leverages Amazon ECS and Amazon ECS Anywhere alongside CodePipeline provides a consistent developer experience between on-premises and AWS deployments. This allows you to meet a range of requirements: data residency, compliance, low-latency computing, amortizing existing assets, and bursting into AWS. In addition, the architecture can handle network partitions by using Amazon SQS to decouple the services deployed on AWS and on premises. By using other services, such as Systems Manager, it reduces the heavy lifting required to create and manage a stretched architecture.
For a hands-on walkthrough of deploying a similar architecture, check out this Amazon ECS Anywhere tutorial on GitHub.