AWS for M&E Blog
Enhanced origin failover using Amazon CloudFront and AWS Lambda@Edge
Recently, an AWS customer serviced more than 10 billion API calls per day at peak. They needed a failover option for brown-outs and other origin failures. They served these API calls through Amazon CloudFront, a content delivery network (CDN) that securely delivers static and dynamic web content with low latency and high transfer speeds using a global network of edge locations. Initially, we recommended the customer use the CloudFront Origin Failover option, which was built for exactly this purpose. However, after testing the standard solution, the customer wanted the failover to provide a rich, and even personalized, experience. Working with their AWS Solution Architect, a solution was created using Lambda@Edge to evaluate each of the 10 billion API calls. If there was a failure, they were able to provide a rich and personalized failover experience to the end-user.
Customer requirements:
- Ability to invoke failover logic on HTTP error response
- Minimal management and operational overhead
- Minimal impact to the end-user experience
- No URL or DNS changes so end users do not know they have been redirected to an error page
- Ability to handle requests to hundreds of different domains
The CloudFront distribution was configured with two origins in a CloudFront Origin Group, the primary was the AWS Load Balancer, fronting the customer’s application dynamic and personalized content. The second was an Amazon Simple Storage Service (Amazon S3) bucket that would contain static HTML content to be updated periodically with other background processes. The S3 bucket origin was configured to be a failover origin, called only when the primary origin responded with an HTTP error status code. The Lambda@Edge function would analyze every request sent to the origin, first validating which origin was being called. Only if the origin was observed to be failing over to the S3 bucket would the Lambda@Edge execute the necessary logic to parse the request host, URI and query string argument headers, then update the headers to redirect the end user to the corresponding S3 partition path and static HTML content.
After a month of running in pre-production with the customer working closely with the account Solution Architects (SAs) and CloudFront team additional requirements were discovered.
Additional customer requirements:
- Ability to invoke failover logic only on http error response, not every request
- Make it more cost effective
CloudFront to the rescue. Lambda@Edge is executed on all CloudFront requests (origin request), whether or not there is an error returned from the origin. The logic should only take action when the origin returns an error, which should rarely occur.
Cost optimization:
The expected Lambda@Edge behavior would be:
- Only invoke on error status codes returned from the origin (4XX or 5XX status codes)
- Only invoke logic for error handling on errors from the origin
- Seamless failover, no URL or DNS changes so end users do not know they have been redirected to an error page
The team thought through a possible solution using the Error Pages feature in CloudFront. This feature allows you to set a custom error page and assign a different status code returned to the user. The idea was we could setup a cache behavior in CloudFront that matched the URI path of the error page we configured to handle error pages.
Here is a deeper look into the flow:
- User makes a request to ‘www.example.com/test/sample?page=home&id=1’, the request is routed to the nearest CloudFront Edge location.
- CloudFronts Cache Behavior configuration associate the Amazon Elastic Load Balancer origin with the path /test* and whitelist forwarding the Host header.
- CloudFront forwards the request to the ELB origin, that forwards the request to the application server.
- Application server returns HTTP 503 Service Available response, and ELB forwards the response back to CloudFront.
- CloudFront has a Custom Error Page configured to redirect on HTTP 503 status code response to a cache behavior with pattern /error-failover* which has a Lambda@Edge associated with, and triggers on Origin Request.
- Lambda@Edge rewrites the host header to match the bucket host and sends the request to the bucket.
- The bucket response is sent back through CloudFront returning to the user.
Where we miss is the customers desire to have a personalized experience as we drop the Headers and URI on an error state. Without the original request Headers and URI Lambda@Edge is unable to provide the customer’s clients with intelligent and personalized origin failover.
Welcome, new headers on error state behaviors
At AWS, we listen to our customers not just for features but also operational pain points, like that above. Within a short period of time the Lambda@Edge and CloudFront engineering teams deployed what seems like a simple fix but solves so many scenarios, like this one.
The two headers are:
- cloudfront-error-uri – This headers value is original request URI, received from the user. for example:
"cloudfront-error-uri": [
{
"key": "CloudFront-Error-Uri",
"value": "/mypersonalpage/index.html"
}
]
- cloudfront-error-args – This header value includes the original requests query string parameters and values. for example:
"cloudfront-error-args": [
{
"key": "CloudFront-Error-Args",
"value": "arg1=var1&arg2=var2"
}
]
Here is the flow again, this time using the original user request data when redirecting to the failover bucket on error response from the ELB origin.
Deliver results
In one week since delivery, the customer has deployed this new solution using the Lambda@Edge they wrote and tested earlier with a few minor additions to use these new headers. The new solution adds 1/ no additional latency and 2/ no additional Lambda@Edge execution cost until an Error state, while providing that personalized experience on failover.
Requirements met with Lambda@Edge on error state with CloudFront origin failover:
- Ability to execute failover logic only on http error response, not on every request
- Minimal management and overhead
- Minimal impact to the end user experience
- No URL or DNS changes so end users does not know they have been redirected to an error page
- Be able to handle requests to hundreds of different domains
- Cost effective
Solve scenarios like these with the new headers:
- Personalized waiting rooms – Using the request parameters in a failover allow you to enhance a personalize response if you want to create a waiting room solution, or visitor prioritization access as described in this blog post.
- Smooth backend scaling – In case you have scaling limits on your application, and you would like to get better control on throttling, while still having a personalized redirect. You can use the request parameters and redirect to static site, and keep the user experience and engagement.
- Intelligent redirect on error – With CloudFront custom error page redirect on error, you can now make different redirect decision based on the combination of error code and request parameters. For example on 404 error, you can redirect to a page with generated link suggesting other locations in your site, based on the query parameters. Or if it’s a 500 Internal server error, you can route to an alternate page that records the request parameter, and notify the user that you will get back to them with more information specific to their request.
- Authorization offload to subscription service – In this case you can offload unauthorized access response (403 forbidden) to a different origin that suggest a premium subscription, based on the user current subscription, or new subscription.
There are quite a few use cases that can benefit the ability to trigger Lambda@Edge only in case of an error response. Many of the redirects can be done in your application, but if you design a serverless application, it is a great alternative to shift the logic to be handled with Lambda@Edge that is trigged only on error response.
Configuration sample:
- Create CloudFront distribution
Follow this 4 minute video to set up your CloudFront distribution with an API Gateway as origin - Add a second resource in your API GW and set the mock response to error 503
- Create a bucket as your origin failover then create a prefix to match your site host name and add personalized pages
- Add Behavior for origin error states
- Create a CloudFront custom error page redirect
- Create Lambda@Edge function to rewrite URI and query string, and associate to cache behavior /error-failover*, triggered on origin requestUse the following Lambda@Edge Code sample:
exports.handler = async (event) => { const request = event.Records[0].cf.request; console.log(JSON.stringify(request)); const s3DomainName = 'your-bucket-name.s3.amazonaws.com'; //change your-bucket-name to the bucket name you created request.origin = { s3: { domainName: s3DomainName, region: 'us-east-1', authMethod: 'origin-access-identity', path: "/" + request.headers['host'][0].value, customHeaders: {} } }; if (request.headers['cloudfront-error-args']){ request.uri = request.headers['cloudfront-error-uri'][0].value + "_" + request.headers['cloudfront-error-args'][0].value + ".html"; } else { request.uri = request.headers['cloudfront-error-uri'][0].value + ".html"; } request.headers['host'] = [{ key: 'host', value: s3DomainName}]; console.log(JSON.stringify(request)); return request; };
Now you can deploy the Lambda@Edge function, and associate it with the CloudFront Cache behavior pointing to error-failover*, triggering at origin request:
- Test personalized failovera. Open your browser and browse to your CloudFront distribution address:
https://your-cloudfront- domain.cloudfront.netb. Now browse to the following address: https://your-cloudfront-domain.cloudfront.net/mypage?userid=1c. Now browse to the following address: https://your-cloudfront-domain.cloudfront.net/mypage?userid=2
Conclusion:
This blog post explores how you can provide a rich personalized failover content utilizing new functionality provided by Amazon CloudFront and AWS Lambda@Edge. We’ve provided working sample code and steps to get you started and additional use cases for you try.