Using Amazon CloudFront and Amazon S3 to build multi-Region active-active geo proximity applications

In today’s digital-first business environment with a globally distributed customer base, it becomes important to adopt an architecture that helps customers deliver digital assets to end-users with the lowest possible latency based on the geo-proximity of assets to the end user.

Companies with digital assets stored in Amazon Simple Storage Service (Amazon S3) commonly configure the traffic to be delivered over Amazon CloudFront’s globally distributed edge network. In different use cases, you would setup your Amazon S3 in multi-Region active-active architecture. Some of the reasons for doing this would include:

For performance/low latency requirements, so that digital assets get delivered from Amazon S3 buckets that are closest to the end-users.
To meet regulatory compliance requirements for business continuity and disaster recovery.
To build a highly resilient and distributed architecture.

In this post, you’ll learn how to use Lambda@Edge to implement geo-proximity routing for delivering assets from an Amazon S3 origin that is closest to the end-user in your Amazon CloudFront distribution, and that has active-active Amazon S3 origins in different AWS Regions.

Solution overview

CloudFront is a web service that speeds up the distribution of your static and dynamic web content to your users. CloudFront delivers your content through a worldwide network of data centers called edge locations. When a user requests content, CloudFront dramatically reduces the number of networks that your user’s requests must pass through, which improves performance. AWS Lambda@Edge is a compute service and a CloudFront feature that lets you run code and modify the client request before CloudFront forwards the request to the origin.

When the request results in cache-miss, CloudFront makes a request to the origin to deliver the request object. If you have an Amazon S3 bucket in a US Region and the client request originated in Asia, then the request will be made from the edge location in Asia and flow across the continents. The solution discussed in this post lets you have an Amazon S3 bucket in each major geographic location, and enable geo-proximity routing through mapping a Region to Amazon S3 bucket. From our previous example, you can have a second Amazon S3 bucket in an AWS Region in Asia and route the request so that the CloudFront edge location will make the request to the origin in Asia, instead of making the request to the Amazon S3 bucket in US.Figure 1: Deliver assets with CloudFront and route request to origin based on geo-proximity

The solution request flow works as follows:

Client makes a request using domain name associated with the CloudFront distribution.
When the request results in cache-miss, at the edge location, CloudFront invokes the AWS Lambda function before making the origin request.
The Lambda function identifies the edge location AWS Region using the context object, looks up which Amazon S3 bucket is mapped to that Region, and modifies the request object.
The Lambda function returns the request object to CloudFront, which was passed to the Lambda function by CloudFront when the function was invoked.
The CloudFront edge location uses origin access identity for a request to be authenticated and authorized.
CloudFront makes an origin request to get the object from an Amazon S3 bucket.

The following screenshot shows how the Region to Amazon S3 bucket mapping looks.Figure 2: AWS Region to Amazon S3 bucket mapping

As described in step 3, Figure 1, once you’ve identified which edge Region will be used to make the origin request, you have control over which Amazon S3 bucket should be used by CloudFront for the origin request.

Implementation details

In this section, you’ll learn about the implementation details of the Lambda function code, how you can manage the Regions mapping, and how this solution handles origin failover.

Origin request Lambda function

The following code shows how the Lambda function identifies the edge location Region, looks up which Amazon S3 bucket is mapped to that Region, and modifies the request object.

us_bucket = "mybucket-us.amazonaws.com"
eu_bucket = "mybucket-eu.amazonaws.com"
ap_bucket = "mybucket-ap.amazonaws.com"
default_bucket = "mydefaultbucket-us.amazonaws.com"

# Regions Mapping
regions_mapping = {
    "us-east-1": us_bucket,
    "us-east-2": us_bucket,
    "eu-central-1": eu_bucket,
    "eu-west-2": eu_bucket,
    "ap-northeast-1": ap_bucket,
    "ap-northeast-2": ap_bucket
    # ...
}

# This header is expected to match the CloudFront customer header for the failover origin.
failover_header = 'originTypeFailover'


def lambda_handler(event, context):
    request = event['Records'][0]['cf']['request']

    origin_key = list(request['origin'].keys())[0]
    custom_headers = request['origin'][origin_key].get('customHeaders', {})

    # Check failover case.
    # If CloudFront origin is not "s3", it's a failover case.
    # If CloudFront origin is "s3" but the customer header matches failover_header, it's a failover case.
    if origin_key != 's3' or failover_header in custom_headers:
        # Since it's a failover case, don't modify the request
        # and let the intended failover origin handle the request.
        return request

    # Identify edge region
    lambda_region = context.invoked_function_arn.split(':')[3]

    # Get S3 bucket based on regions mapping
    domain_name = regions_mapping.get(lambda_region, default_bucket)

    # Update origin request object
    request['origin']['s3']['domainName'] = domain_name
    request['origin']['s3']['region'] = lambda_region
    request['headers']['host'] = [{'key': 'host', 'value': domain_name}]

    return request

Let’s go over the key areas in this Python code:

The edge Region is identified by getting the invoked_function_arn from the context object. The value has the edge region information, the string is parsed, and the region is assigned to the lambda_region variable.
Using the lambda_region, a lookup is made to get the Amazon S3 bucket URL and is assigned to the domain_name variable. This Amazon S3 bucket will be used for the origin request. Observe the use of default_bucket. For any event where there is currently no mapping between edge Region and an Amazon S3 bucket, default_bucket will be used as the origin.
Using lambda_region and domain_name variables, the request object is modified for CloudFront to use for origin request.

Options to manage Regions mapping

As illustrated in the code above, a simple way to manage Regions mapping is to include this part of the information in the code itself, as it’s done by creating a regions_mapping variable. This can be a good method if you don’t expect frequent changes to the Regions mappings. For example, if you’re looking to have one Amazon S3 bucket per continent, you most likely won’t need change the mapping too frequently.

If you do need to be able to get the Regions mapping data dynamically, consider using AWS Secrets Manager or AWS Systems Manager Parameter Store by storing the mapping data and retrieving it in the Lambda function. Both can be great methods for externalizing this part of the logic. Furthermore, you could use any source for this case and make an HTTP request to get this data. In this case, consider potential additional latency and the use of caching the object so that the request is only made on the initial Lambda cold start.

Origin failover

You can set up CloudFront with origin failover for scenarios that require high availability. For failover to work correctly, the Lambda function must distinguish if it’s a normal request to the origin, a request that you want to modify, or a failover request, where you don’t want to modify the request. If it’s a failover request, then the initial normal request didn’t succeed and there’s no need to modify the request again. Moreover, the result will most likely be the same. For the failover case, you return the unmodified request object back to CloudFront and let the failover origin handle the request.

This is achieved by adding a custom header to your chosen failover origin. This custom header is expected to match with the value assigned to the failover_header variable in your Lambda function. The check is made with the if statement to identify if it’s the failover request. If this is the case, then the request is returned to CloudFront before the code that modifies the request object. Therefore, the request remains unmodified.

Conclusion

In this post, you learned how to implement the delivery of digital assets to end-users in a multi-Region active-active Amazon S3 setup with lowest latency. The solution presented in this post helps you achieve this with Lambda@Edge running at CloudFront edge locations. The sample Lambda code in this post gets executed on cache miss and uses Amazon S3 Region mapping data to route the CloudFront origin request to the closest S3 bucket, all while taking care of the origin failover scenario.

About the authors

Artem Lovan

Artem Lovan is a Technologist and Solutions Architect at AWS. Artem helps guide AWS customers build scalable and sustainable products in the cloud. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.

Yoginder Sethi

Yoginder Sethi is a Senior Solutions Architect working in the Strategic Accounts Solutions Architecture team at AWS. He has extensive experience and background in building and managing large scale cloud architectures, Devops Tooling and Observability. He is based out of San Francisco Bay area, California and outside of work he enjoys exploring new places, listening to music and hiking.

Networking & Content Delivery