Networking & Content Delivery
Reducing Latency and Shifting Compute to the Edge with Lambda@Edge
Lambda@Edge provides you with the ability to bring compute power closer to client applications. With the recent increase in function limits and ability to send binary responses, as well as the addition of remote calling from functions, the capabilities of Lambda@Edge have grown.
Amazon CloudFront is a global content delivery network (CDN). If you put data into CloudFront, it moves that data from centralized locations to locations nearer in the network to your customers, which means that they can receive the content more quickly. This lower latency and higher transfer speed means that your client applications can respond to network interactions more quickly and provide an improved user experience.
In this post, I’m going to show how you can take an application with a relatively slow rate of changing data and use Lambda@Edge to both provide low latency data to application clients and remove unnecessary calls to a backend service. My example covers data for a rugby game season.
Developing with Lambda@Edge
There are some important differences to developing applications for Lambda@Edge as opposed to AWS Lambda. Here are a few things to keep in mind:
- Supported languages
- Edge locations and deployment
- Testing and logs
- Amazon CloudFront cache
Supported languages
Lambda@Edge currently supports the Node.js 6.10 runtime.
Edge locations and deployment
Lambda@Edge deploys your functions to run in multiple AWS locations around the world. It can take a while for a function to be replicated and for the CloudFront distribution changes to propagate to our AWS locations around the world.
- Allow a few minutes for the initial CloudFront distribution to deploy.
- For every function deployment after that, allow some time for the new function to deploy to the AWS location nearest to where you are testing.
- A good practice in testing is to put a test flag or unique header into the response so that you know when a new version is being returned. Remove any insecure identifier in the production environment.
Testing and logs
When your Lambda@Edge function gets invoked, Lambda@Edge sends the associated log data to the nearest AWS Region relative to where you are testing or where your application users are.
When you test your function, you need to check CloudWatch Logs in the nearest AWS Region. For example, if you are in the United Kingdom, and you are deploying the Lambda@Edge function to us-east-1, it is likely that your logs appear in eu-west-2 and not us-east-1. Be sure to check geographically near Regions if you cannot immediately find the logs in CloudWatch.
The logs also contain the version of the function being tested, so that you always know which version of the Lambda@Edge function is running.
CloudFront cache
Your Lambda@Edge function is invoked by CloudFront when it processes events. Part of that event flow includes checking its cache. If you use the origin request and response elements of Lambda@Edge and wonder why you’re not seeing things execute, check whether your request was cached within CloudFront. Check the headers in the response, and see if you are getting “CloudFront Cache Hit”.
For example, if you inspect the HTTP response headers and you see an X-Cache header with “Hit from cloudfront,” then your response is coming from the CloudFront cache. If the response isn’t computed in the origin request or response, then it doesn’t show in the logs.
Solution architecture
The following diagram shows an overview of the architecture. The diagram shows only one Lambda@Edge function (bounded by the orange dotted line), that has two paths through the function. The first path simply passes through the function as a normal request, when the querystring does not contain “tables=current”. This is returned to CloudFront and then back to the user. When the querystring does contain “tables=current” then the function:
- Retrieves the object from S3 directly
- Reads the JSON in the S3 object
- Generates the tables data
- Generates a different JSON response packet
- Returns the response packet to CloudFront
CloudFront then returns the response packet to the user.
The Lambda@Edge function is triggered whenever there is no cache hit on CloudFront and it requests data from the origin. In this case, the origin is the S3 bucket storing JSON data. All the responses either from S3 or from the Lambda@Edge function go back through CloudFront, and can be cached if required.
Elements:
- S3 bucket
- JSON data with all the Rugby data within it
- CloudFront distribution + behavior with the S3 bucket as origin
- Lambda@Edge origin request function
Fictional rugby tournament season data
Here’s a purely fictional rugby tournament with 32 teams in four separate leagues (1st, 2nd, 3rd, and 4th), with eight teams in each league. All games are played on a Saturday during the season, although not necessarily at the same time of day. This gives 14 game weeks but only a relatively few number of updates that occur on a Saturday. The current season is completed up to game week 10.
Scoring is relatively simple:
- 4 points for a win
- 2 points for a draw
- 0 points for a loss
However, if any team scores 4 or more tries in the game, they get 1 point. That way, even if a team is losing badly, they can still get something out of the game. It also means that the data that you need to store needs to show the number of tries a team has scored in a game.
The data is relatively simple JSON data. It contains all relevant information to work out everything that anybody might need, such as each current league table and who won each game. However, only the basic information is stored for each actual match, such as the data for the way teams can score points in a game and whether the game has taken place.
So the data for each match is in a string like this:
So if the game hasn’t taken place, the first character would be 0 and all the numbers would be 0 as well. The score and league points can be calculated from this information.
Walkthrough
An S3 bucket holds data as an object. In this case, it is a simple file object that contains information about the rugby games that have gone on in a season. The data stored is the core data needed for the client application. This data needs to be in a format that is machine-readable and easily creatable. For the purposes of this, make it a simple JSON packet so that it can be easily updated by a human with an editor but also readable by a computer.
Here are the steps in the solution:
- Create the S3 bucket and CloudFront distribution.
- Upload data to S3.
- Test the CloudFront distribution.
- Make a Lambda@Edge viewer request function and origin request function.
- Set up the origin request Lambda@Edge function.
- Deploy the function.
Create the S3 bucket and CloudFront distribution
Amazon S3 is an integrated service with CloudFront. This means that you can set up an S3 bucket as an origin for a CloudFront distribution. Use the following AWS CloudFormation template to set up S3 and CloudFront:
Upload the data to S3
You can upload the data file using the console or AWS CLI.
To upload the rugby data file to S3 using the console
- Download the file. TODO
- Save with the filename rugby_data.json.
- Open the Amazon S3 console.
- Locate the S3 bucket that you already set up. Search for “rugby” to narrow the results.
- Select the bucket name and choose Upload.
- Drag and drop the downloaded file from step 1, or choose Add Files and choose the file from step 1.
- Choose Next.
- Don’t change any permissions and choose Next.
- Under Metadata, for Header, choose Content Type and application/json.
- Choose Save, Next.
- Review and choose Upload.
To upload the data with the AWS CLI, use the following command:
aws cloudformation describe-stacks --region us-east-1 --stack-name RugbyDataS3 # change directory to location of rugby_data.json cd <directory> # read the Outputs section and paste the "RugbyBucket" value into this command aws s3 cp ./rugby_data.json \ s3://<RugbyBucket S3 Value> \ --region us-east-1 --content-type application/json # e.g. aws s3 cp ./rugby_data.json \ # s3://rugbydatas3-rugbydatabucket-1vinm9udkw74kh \ # --region us-east-1 # --content-type application/json
Now you have the data stored in S3 for the rugby tournament, and you can retrieve that data via a request to CloudFront. This ensures that you can send back useful information to the client application requesting the data.
Test the CloudFront distribution
Now that the data is uploaded to the S3 bucket, the CloudFront distribution has an associated domain name. The CloudFront distribution may take a short while to propagate across all points of presence. If the console distribution does not show State as “Enabled,” then wait a few minutes and refresh the console until it does.
- Open the CloudFront console.
- Select the distribution that you created earlier and note the domain name.
- Copy and paste the domain name into your browser. Add https:// at the start and the path /rugby_data.json to the end. For example:https://example-distribution-name.cloudfront.net/rugby_data.json
When you visit that URL, you should see the dataset downloaded by your browser. Depending on how your browser handles JSON, you either see the JSON in your browser, or it prompts you to download the file to your computer.
However, you don’t want the Lambda@Edge function to run for every call to this distribution, so you have to set up something called a behavior. To view this, select the ID of the distribution, and then choose the Behavior tab. You’ll see a path pattern, with “rugby_data.json” specified as a behavior:
The only two elements to worry about here are the Path Pattern, which has been set to rugby_data.json and the fact you have forwarded query strings and not cached them. This means that you can link the Lambda@Edge function to only this pattern. The cache does not see a URL with different query strings as the same URL, and therefore returns it from the cache. This now allows you to connect the Lambda@Edge function to just this behavior instead of to all requests to this distribution.
Setting up a Lambda@Edge viewer function vs. an origin request function
At this point, you can make a request to retrieve the data from the CloudFront distribution to retrieve the rugby tournament data as a full dataset. What you have done so far is relatively simple. What if you wanted to provide a subset or even a different or computed dataset of this full dataset?
Lambda@Edge allows you to generate a different HTTP response without retrieving and forwarding the response from the origin, which in this case would be the rugby_data.json S3 object. If you return an HTTP response in either the viewer request or the origin request, then the origin is never called, and the viewer sees the generated HTTP response. For more information, see Generating HTTP Responses in Request Triggers.
- The viewer request is invoked on every request at the edge. It is processed before it reaches the CloudFront cache. This means that the function only has access to the header and request information from the user.
- An origin request gives you the opportunity to provide a response that could be cached if needed, and as such limit your compute if needed. It is processed only if the CloudFront cache doesn’t have the request cached. For example, if a request comes in for a JSON object at /details.json and you have an origin request that returns a JSON object in the origin request Lambda@Edge function and you specify that it is cached for an hour after the first call, then after that first call, the data will be cached in CloudFront for an hour and returned on every subsequent call to /details.json.
Take a scenario where you want to return the current league standings for each league by putting a query-string variable onto the URL, like the following:
?tables=current
This data doesn’t need to be processed on every request and can be cached, so this fits perfectly into the origin request Lambda function scenario.
Set up the origin request Lambda@Edge function
Use CloudFormation to set up a simple Lambda@Edge function. The function should have an associate IAM role that allows it to have read-only access to S3 and pass through all requests straight to the origin S3 bucket by default.
Now that you’ve created the function and given it the right permissions, you need to create the origin request Lambda@Edge function with the following basic structure:
var AWS = require('aws-sdk'); const querystring = require('querystring'); exports.handler = (event, context, callback) => { // This is how the request data is passed into the Lambda@Edge function var request = event.Records[0].cf.request; const qs = querystring.parse(request.querystring); if(qs.tables != 'current') { // There's no need to process so pass through as a normal request to S3. // Example: Additionally you can add a cache control header // to avoid caching the response if you want (good for debugging) request.headers["cache-control"] = [ { "key": "Cache-Control", "value": "max-age=0" // expire it immediately } ]; callback(null, request); } else { // TODO: Processing to create the JSON with the tables } };
The function simply passes through to the origin S3 bucket if the query string doesn’t have the correct data. Adding in a Cache-Control header at this point is useful so that it does not cache into CloudFront until you remove that Cache-Control header or set the value to something different.
The else block is where you process the data to generate the rugby tournament table data, who is winning or losing the league. You could generate the league standings directly from the S3 data. However, there’s a problem because at the point of processing, the Lambda@Edge function does not have access to the S3 data. The only thing you know is that the CloudFront cache does not have a cached version.
To gain access to the data, the function needs to get the data directly from S3. This is now possible with the recent release of access to network calls from viewer events. To generate the current league standings, pull the data from S3 and process the data in the Lambda@Edge function and return JSON.
Update the code in the TODO section of the function, with the following code, substituting your S3 bucket name where needed.
// get the content from S3 (or upload a new version of // the data here to avoid caching!!!) var s3 = new AWS.S3(); s3.getObject( { Bucket: '<your_bucket_name>', Key: 'rugby_data.json' }, function(err, s3data) { if(err) { // IAM Role Permissions for the Lambda@Edge function should // allow for read-only access to S3 console.log("Check IAM Role Permissions"); callback(err); } else { // S3 get the body of the S3 Object var body = s3data.Body.toString('utf-8'); // parse the JSON data and calculate the tables // TODO: calculateTables(JSON.parse(body)); var tables_data = {}; // Initially return empty data const response = { status: '200', statusDescription: 'HTTP OK', httpVersion: request.httpVersion, body: JSON.stringify(tables_data), headers: { "cache-control": [ { "key": "Cache-Control", // expire it immediately (for testing) "value": "max-age=0" } ] } }; console.log('Response:', JSON.stringify(response, null, 2)); callback(null, response); // return custom response } } );
The function is set up to generate the data for the current league standings based upon the S3 data. The function then generates a JSON response and returns the data in the callback instead of the request. Returning the response in the callback means that the S3 origin is never called, and this is used as the response instead of the origin data. This function then produces a cached dataset for a normal request. For a request with a specific query string variable, it computes a generated response that can be cached through using the Origin Request and Cache-Control headers. You could use different services to generate the league standings, such as Amazon DynamoDB or even another Lambda function, but this shows what is possible.
In this example, the data is currently set to expire immediately using max-age=0. Because it is in an origin request Lambda@Edge function, the request could be cached in CloudFront by changing this value. For example, by setting the caching to an hour, you could reduce the compute here. You could even cache it up to a specific time.
Deploy the function
Each time you update the function, to push it to the edge, you need to:
- Publish a new version of the Lambda function.
- Add a CloudFront trigger on the origin request (for this example).
- Choose the correct distribution.
- Save the Lambda function.
- Wait for the function to propagate to the edge.
For a complete example function, you can also upload the following code from this GitHub repository into your Lambda function and deploy it:
https://github.com/aws-samples/aws-lambda-edge-rugby-blog-example/blob/master/lambda.js
The JSON output using this function is shown in the following screenshot:
Conclusion
This post shows that it is possible to move compute to the edge, and provide lower latency solutions for some data processing and client applications. This gives you opportunities to increase the speed of your client applications and decrease the processing that happens on centralized servers.
Within Lambda@Edge, viewer request–generated responses can be up to 40 KB in size and origin request responses can be up to 1 MB in size. There are also different timeouts and maximum sizes of functions. For more information, see Limits on Lambda@Edge.
These may feel like limiting constraints but as long as you understand them, and take them into account when designing your applications, you can then shift compute to the edge and improve the latency of your applications for your users. Make your application not just serverless but serverless at the edge as well. Please visit the Lambda@Edge webpage to learn more and to get started.
Blog: Using AWS Client VPN to securely access AWS and on-premises resources | ||
Learn about AWS VPN services | ||
Watch re:Invent 2019: Connectivity to AWS and hybrid AWS network architectures |