Announcing zone-aware routing in Amazon ECS Service Connect
In microservices architecture, traffic patterns and service placement decisions directly impact both cost and performance. With zone-aware routing for Amazon ECS Service Connect, Amazon Elastic Container Service (Amazon ECS) now prioritizes requests to healthy endpoints in the same Availability Zone as the client. This helps reduce cross-zone data transfer costs and minimize latency without compromising availability.
In 2022, ECS introduced Amazon ECS Service Connect, a managed service mesh that simplifies building resilient distributed applications. Service Connect lets you refer to services by logical names using AWS Cloud Map namespaces and automatically distribute traffic between ECS tasks without deploying load balancers. Each task runs an Envoy sidecar proxy that handles service discovery, load balancing, and now zone-aware routing. Services communicate using client aliases, which are logical names that abstract the underlying network details.
Service Connect supports two service types:
Client-server services can both initiate and receive connections, acting as a client in some interactions and a server in others.
Client-only services can only initiate outbound connections to other services.
Amazon ECS Service Connect turns on zone-aware routing by default for new and existing services, and you don’t need to make any infrastructure or application code changes. Existing services (both client and server services) require you to perform a one-time redeployment to activate the new routing behavior. In this post, we explain how zone-aware routing works and walk you through setting up a multi-AZ ECS cluster to see it in action.
Zone-aware routing in Amazon ECS Service Connect
How it works
ECS Service Connect routes most traffic to endpoints in the same AZ, reducing cross-AZ network calls. The algorithm uses Envoy’s zone-aware routing feature to:
Discover endpoints – The proxy maintains an up-to-date view of all endpoints in the destination service, including their AZ placement.
Prioritize local AZ – When routing a request, the proxy prioritizes sending traffic to endpoints in the same AZ as the client task initiating the request.
Route based on residual capacity – Rather than computing a local weight, the algorithm compares endpoint distribution percentages between the source and destination clusters in each AZ. When a destination AZ has proportionally more endpoints than the source, the surplus capacity (“residual”) absorbs cross-zone traffic from overloaded AZs, so no single zone is overwhelmed.
Fall back gracefully – If there are insufficient healthy endpoints in the local AZ, traffic automatically spills over to healthy endpoints in other AZs based on residual capacity, so availability is not compromised.
Rebalance dynamically – As endpoints scale up or scale down, the routing decisions update in real time without requiring redeployment.
By prioritizing more than 80% of traffic within the same AZ when endpoints are balanced across AZs, zone-aware routing can significantly reduce the volume of cross-AZ network calls. This leads to direct cost savings, especially for data-intensive workloads.
Lower latency
Each AZ traversal adds network latency.
By keeping traffic local to an AZ, zone-aware routing can deliver approximately a 24% reduction in median network latency in multi-AZ deployments when endpoints are balanced. Intra-AZ communication latency can be as low as 300 to 400 μs, compared to 1.5 ms or more for cross-AZ calls.
No application code changes
The Envoy sidecar proxy handles zone-aware routing. You don’t need to change your application code.
With ECS Service Connect, it is on by default, with no additional configuration needed.
Preserved high availability
If local endpoints are unhealthy or insufficient, traffic automatically spills over to healthy endpoints in other AZs.
Getting started with zone-aware routing
The following walkthrough sets up a multi-AZ Amazon ECS cluster on EC2 instances, deploys a frontend and a backend service with Service Connect, and verifies that zone-aware routing keeps traffic within each Availability Zone.
Create security groups for the load balancer, frontend (client), and backend (server). Configure ingress rules to allow traffic from the ALB to frontend port 8080, and from frontend to backend port 8090.
Step 4: Create a launch template and Auto Scaling group
Create a launch template using the ECS-optimized AMI. The instances auto-register to the cluster through UserData. Then create an Auto Scaling group spanning all 3 AZs with enough capacity for your tasks (at least 2× the number of AZs for zone-aware routing to activate):
Note: desired-capacity = TASK_COUNT *2 = 12 because each t3.medium used in this walkthrough fits one task (1024 CPU / 2048 MB) and you run 6 backend and 6 frontend tasks.
Step 5: Create a load balancer
Create an Application Load Balancer (ALB) to route external traffic to the frontend service:
Register task definitions for backend and frontend services. The portMappings include a named port and appProtocol (required for Service Connect), and the command field configures the container to run as an HTTP server.
The frontend is configured with an egress route to proxy requests to the backend service through Service Connect’s DNS alias (sc.test.az.aware.backend:8090). This is triggered when the azAwareRouting header is present:
Deploy both services with Service Connect enabled. The key configuration is the serviceConnectConfiguration block, which registers each service in the namespace and makes it discoverable through a client alias:
To test the endpoint, send traffic to the ALB with the azAwareRouting header. This triggers the frontend to proxy to the backend through Service Connect:
# Send traffic --- must include the azAwareRouting header to trigger backend proxy
curl -H "azAwareRouting: true" -H "Connection: keep-alive" http://$ALB_DNS/
The response is JSON showing which AZ handled the frontend and backend requests.
Step 9: Verify zone-aware routing
Zone-aware routing is on by default for all Service Connect services. Send multiple requests and observe that frontend and backend AZs match (traffic stays local):
for i in $(seq 1 20); do
curl -s -H "azAwareRouting: true" -H "Connection: keep-alive" http://$ALB_DNS/ | jq .
done
Response fields:
availabilityZoneId — the AZ where the frontend task handled the request.
upstreamAvailabilityZoneId — the AZ where the backend task processed it.
With endpoints evenly distributed (2 tasks per AZ), both fields should match, which indicates that zone-aware routing is active.
Step 10: Test failover behavior
To see how zone-aware routing handles AZ imbalance, scale down the backend so one AZ has no endpoints. This demonstrates the automatic cross-AZ spillover:
# Scale backend to 2 tasks --- one AZ will have zero backend endpoints
aws ecs update-service \
--region $REGION \
--cluster $CLUSTER_NAME \
--service az-aware-backend-service-ec2 \
--desired-count 2
# Wait for tasks to drain
echo "Waiting for backend to scale down..."
sleep 60
# Now send traffic --- you should see some cross-AZ routing
for i in $(seq 1 20); do
curl -s -H "azAwareRouting: true" -H "Connection: keep-alive" http://$ALB_DNS/ | jq '{frontend: .availabilityZoneId, backend: .upstreamAvailabilityZoneId}'
done
Frontend tasks in the AZ with no backend route cross-zone, so the frontend and backend AZ differ for those requests.
The container image used in this walkthrough exposes AZ information directly in the HTTP response (availabilityZoneId and upstreamAvailabilityZoneId). For your own application containers that don’t expose AZ data, you can inspect Envoy stats directly to confirm that zone-aware routing is working.
Connect to an EC2 instance through SSM and query the Envoy admin endpoint through its Unix socket:
Minimum cluster size – To activate zone-aware routing, the total number of endpoints in the destination service must be at least 2× the number of Availability Zones (for example, at least 6 tasks for a 3-AZ deployment). This threshold is enforced internally by the Envoy proxy and is not customer configurable.
Automatic fallback – If the endpoint count falls below this threshold, zone-aware routing is automatically disabled, and traffic is distributed evenly across all AZs to preserve availability.
Works with existing Service Connect features – Zone-aware routing works with existing Service Connect capabilities, including service discovery, cross-account connectivity, and traffic metrics.
Conclusion
Zone-aware routing is now available in all AWS Regions that support Amazon ECS.
Zone-aware routing reduces cross-AZ data transfer costs and latency. Because it’s on by default, you don’t need to modify application code or deploy additional infrastructure. For existing services, redeploy once to activate the feature.
To get started, open the Amazon ECS console and redeploy your existing services to activate zone-aware routing. To learn more, see Service Connect in the Amazon ECS Developer Guide.