AWS Storage Blog

Building an active-active, latency-based application across multiple Regions

Many enterprises are looking to optimize the performance of their applications in order to deliver the best possible experience to their end users. Applications with global user bases are keenly aware of latency, and how high latency and poor performance can negatively affect their users’ experience. With data in one location, like on premises, it can be difficult to overcome high latency for users in far away locations.

In this post, I show how to use Amazon S3 Multi-Region Access Points with Amazon CloudFront to serve your web applications, static assets, or any objects stored in your Amazon Simple Storage Service (S3) in a Multi-Region Active-Active setup that provides latency-based routing so that content is delivered with the lowest network latency. With a multi-Region storage setup in the cloud on S3 and a content delivery network like CloudFront you can provide your application’s users around the world with better latency and performance.

Solution overview

Amazon S3 Multi-Region Access Points provide a global endpoint that applications can use to fulfill requests from S3 buckets among many located in multiple AWS Regions. You can use S3 Multi-Region Access Points to build multi-Region applications with the same simple architecture used in a single Region and then run those applications anywhere in the world. Instead of sending requests over the congested public internet, S3 Multi-Region Access Points provide built-in network resilience with the acceleration of internet-based requests to S3.

CloudFront is a content delivery network (CDN) that speeds up the distribution of your static and dynamic web content to your users. CloudFront delivers your content through a worldwide network of data centers called edge locations, connected to the AWS Regions through the AWS network backbone. When a user requests content that you’re serving, CloudFront improves the performance by retrieving the content from the origin and caching the content closer to the viewer.

In CloudFront, when you have an S3 bucket configured as the origin, you improve the performance from the end user to CloudFront, but if the S3 bucket is geographically far away from the edge location and therefore from the end user, the request to the origin could have suboptimum latency. This is where S3 Multi-Region Access Points can help. By combining the two services, you can further optimize the performance of your application to deliver assets from your Amazon S3 buckets with the best possible performance.

S3 Multi-Region Access Points as a custom origin for CloudFront

Figure 1: S3 Multi-Region Access Points as a custom origin for Amazon CloudFront

As displayed in Figure 1, the request flow and architecture of the solution work as follows:

  1. Client makes a request that is expected to match the path pattern to the S3 Multi-Region Access Point origin.
  2. CloudFront matches the path pattern to the S3 Multi-Region Access Point at origin and invokes the associated origin request Lambda@Edge function.
  3. The Lambda function modifies the request object, which is passed in the event object, and signs the request using Signature Version 4A (SigV4A).
  4. The modified request is returned back to CloudFront.
  5. CloudFront, using the SigV4A authorization headers from the modified request object, makes the request to the S3 Multi-Region Access Point origin.
  6. S3 Multi-Region Access Point routes the request to the S3 bucket based on lowest network latency.

The following code shows the key parts of the Lambda function code and how to modify and sign the request with SigV4A.

Note: The following code snippet focuses only on the key parts of the Lambda function. The code isn’t complete and has additional parts outside the Python function. The following section on deployment and implementation details has a URL link to a GitHub repository with a complete code of the Lambda function.

def lambda_handler(event, context):
    request = event['Records'][0]['cf']['request']

    origin_key = list(request['origin'].keys())[0]
    custom_headers = request['origin'][origin_key].get('customHeaders', {})

    # Check failover case. If CloudFront origin customer header is included that signals it's the failover request.
    # In this case, assumed, SigV4A singing should not be performed and
    # unmodified request should be used for the failover origin.
    if failover_header in custom_headers:
        return request

    method = request["method"]
    endpoint = f"https://{request['origin']['custom']['domainName']}{request['uri']}"
    data = None  # Empty for GET, could be mapped from request, if there is such case. E.g. request['body']['data']
    region = '*'  # For S3 Multi-Region Access Point it's * (e.g. all regions). Also, that's why SigV4A is used/required.
    service = 's3'

    headers = request["headers"]
    request_headers_list = list(headers.keys())

    cf_read_only_headers = {}
    # Some CloudFront headers are read-only and can't be removed from the request.
    # Therefore those have to be part of signing headers. See more details in docs
    # https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/edge-functions-restrictions.html
    for h in cf_read_only_headers_list:
        if h in request_headers_list:
            cf_read_only_headers[headers[h][0]['key']] = headers[h][0]['value']

    # CloudFront adds "X-Amz-Cf-Id" header after Origin request Lambda but before the request to the origin.
    # Therefore it has to be part of the signing request.
    cf_read_only_headers['X-Amz-Cf-Id'] = event['Records'][0]['cf']['config']['requestId']

    # Sign the request with Signature Version 4A (SigV4A).
    auth_headers = SigV4AWrapper().get_auth_headers(method, endpoint, data, region, service, cf_read_only_headers)

    # "X-Amz-Cf-Id" header can't be directly set in request object.
    # Therefore it has to be part of the signing request, however has to be removed from the
    # request object as CloudFront will set it before making request to the origin.
    auth_headers.pop('X-Amz-Cf-Id')

    cf_headers = {}
    # Add SigV4A auth headers in the by CloutFront expected data structure.
    for k, v, in auth_headers.items():
        cf_headers[k.lower()] = [{'key': k, 'value': v}]

    # Override headers to only include the one expected by S3 Multi-Region Access Point (e.g. the one that are signed).
    request['headers'] = cf_headers

    # If querystring is in request, remove as else signature won't match.
    # Note: You can have querystring be part of the request, however you first need to add those in the signing request.
    request.pop('querystring')

    return request

In order to successfully sign a request, all attributes that will be part of the request to the origin have to be included in the signing request. Most commonly, those are the headers, query string parameters, method, request body, and the URI. The signing request also expects additional data such as AWS service name and AWS Region, which is global in a multi-Region case.

Once the request is signed using the SigV4A, the following example authorization headers are added to the request object, which is used by CloudFront edge to make the request to the Amazon S3 Multi-Region Access Point origin.

Sample SigV4A authorization headers

Figure 2: Sample SigV4A authorization headers

The preceding example code in this post is written in Python. To learn how you can sign a request using SigV4A in other languages such as Node.js and Java, see the open-source sigv4a-signing-examples repository.

Deployment and implementation details

To deploy the solution, use an AWS CloudFormation template. Start by creating a deployment package for the Lambda function and use it in the CloudFormation template to deploy the package alongside the other AWS services that are used in this solution.

In the GitHub repository, follow the steps and complete the sections PrerequisitesPackaging Lambda function, and Deploying CloudFormation stack. Once you have deployed the CloudFormation stack, come back to this post.

Congratulations! You successfully deployed the solution. Let’s look at the key configurations of the AWS services in the AWS Management Console.

  1. Navigate to the CloudFront distribution page, and observe that the Amazon S3 Multi-Region Access Points origin type is Custom Origin.

AWS console view of the S3 Multi-Region Access Points configuration as CloudFront custom origin

  1. Navigate to the CloudFront distribution behavior. The Lambda function is associated with the origin request. This optimizes Lambda@Edge use since the signing process will only be performed if the request can’t be served from the CloudFront cache and an origin request is necessary.

AWS console configuration of Lambda@Edge as origin request

  1. Navigate to the Amazon S3 Multi-Region Access Point in S3 and observe the two associated S3 buckets.

AWS console configuration of S3 Multi-Region Access Point assocciation with two Amazon S3 buckets

Testing the deployment

Before you start testing the deployed solution, you first need to upload a file to each of the Amazon S3 buckets that are associated with the S3 Multi-Region Access Points.

Note: For testing purposes, upload the file to each S3 bucket separately. For a production configuration, I recommend using replication rules inside the S3 Multi-Region Access Points to synchronize data among buckets. To learn more, refer to the documentation on configuring bucket replication for use with Multi-Region Access Points. Alternatively, you can use Amazon S3 replication inside the S3 bucket configuration directly. To learn more, refer to Amazon S3 Replication.

Go back to the GitHub repository, and in Testing the deployment, follow the steps in the sections Upload file to Amazon S3 buckets and Lookup CloudFront distribution DNS. Come back to this post once you have completed those steps.

Now that the CLOUD_FRONT_DNS variable is set in your terminal, you can run the first test. Make an HTTP request using the cURL command line tool.

curl "${CLOUD_FRONT_DNS}"

Depending on the latency-based routing from your device, you should receive the following response text.

hello from s3 bucket "<BUCKET-ONE-NAME-HERE>"/"<BUCKET-TWO-NAME-HERE>"

At this point, you validated the deployed solution. CloudFront served index.html from the origin using the S3 Multi-Region Access Points.

The next test is to validate that the response text, in particular the S3 bucket name in the response, is the one you would expect. In order to achieve that, the client request should be as close as possible to the Region where the S3 buckets you added to the S3 Multi-Region Access Points.

One way to achieve this is to use AWS CloudShell. CloudShell is a browser-based shell that makes it easy to securely manage, explore, and interact with your AWS resources. For the purposes of this post, CloudShell comes preinstalled with cURL, and you can open it in each AWS Region where you have S3 buckets. Using CloudShell will help to simulate a client being geographically closest to each of the AWS Regions.

Back in the terminal where you exported the CLOUD_FRONT_DNS environment variable, run the following command to get the full cURL request command as the output that you can then use to run inside the CloudShell.

echo "curl ${CLOUD_FRONT_DNS}"

Note: To learn how to launch CloudShell and select Regions, refer to the documentation on launching AWS CloudShell.

In the AWS Management Console:

  1. Launch CloudShell.
  2. Select the Region where your first S3 bucket is located.
  3. Run the full cURL command that you received as the output from the most recent command. Observe the response text.
  4. Select the Region where your second S3 bucket is located.
  5. Run the full cURL command again. Observe the response text.

When you ran the cURL command from the Region where your first S3 bucket is located, you should have received response text.

hello from s3 bucket "<BUCKET-ONE-NAME-HERE>"

When you repeated this by running the cURL command from the Region where your second S3 bucket is located, you should have received response text.

Note: Even though this test tries to simulate the latency-based behavior by using CloudShell in the same Region as the S3 bucket, keep in mind that there is no client stickiness or affinity to a particular Region behind an Amazon S3 Multi-Region Access Point.

hello from s3 bucket "<BUCKET-TWO-NAME-HERE>"

With this test, you validated that the S3 Multi-Region Access Points indeed route the request to the S3 bucket with the lowest network latency.

Cleaning up

After you’ve tested the solution, you can clean up all the created AWS resources by deleting the CloudFormation stack. To delete the CloudFormation stack, follow the steps in the Cleanup section.

Conclusion

In this post, I showed you how you can use Amazon CloudFront with Amazon S3 Multi-Region Access Points to achieve a multi-Region active-active setup for your applications to further optimize the performance of your application and deliver assets from your S3 buckets with the best possible performance. You used CloudFront to route the request from the client to the edge, and S3 Multi-Region Access Points to further route the request from the edge location to the origin. In using CloudShell, you tested the latency-based routing by simulating a client being closest to each of the Regions that are added to the S3 Multi-Region Access Points. Using this solution, you optimize the performance of your applications to deliver the best possible experience to your end users.