Networking & Content Delivery
Writing and testing CloudFront Functions with production traffic
While maintaining a web application, sometimes we need to build a simple logic that must run in low latency. For example, you may want to set up website redirection based on condition, or quickly verify an incoming header. CloudFront Functions is ideal for these use cases since it lets you write lightweight JavaScript code that can add or modify response headers, redirect requests to a newer URL, or respond with a custom response.
This scripting-based approach provides high flexibility, but it also comes with the responsibility of writing effective code and testing it correctly. By following best practices while developing your code, you can make sure sure that you use this edge computing feature most efficiently and minimize the risk of run-time error.
In this post, we’ll share some considerations on writing and testing CloudFront Functions codes, and test the function with real HTTP traffic patterns.
CloudFront Functions characteristics
CloudFront Functions is secure, fast, and cost-effective edge computing suitable for manipulating simple HTTP requests or responses. CloudFront Functions execution is triggered by Amazon CloudFront, on viewer-request or viewer-response event.
What does the CloudFront Functions code look like? The following is one example code (you can find more examples in this github repository):
As you can see, it’s JavaScript code that includes a mandatory function handler(event)
. Once the function is associated with CloudFront, CloudFront will run the handler
function as the entry point. The event
object contains the HTTP request metadata, or response metadata, depending on the event trigger. The handler function must return an HTTP request or response object (only HTTP responses are allowed in case of viewer-response event) so that CloudFront can continue processing with the returned object. Note that you can only render headers of the response object with CloudFront Functions, whereas Lambda@Edge can render response body.
CloudFront Functions is also designed to run within limited CPU time to minimize latency and operate at the largest scale. Each function execution has limited computing resources expressed in percentages as a compute utilization metric. A CloudFront function might fail if the function uses computing resources over the allowed amount. This will lead CloudFront to serve an error response to the viewer. The CloudFront console and TestFunction API provide the compute utilization as part of the test result, showing what percentage of the maximum allowed computing resource was utilized during the test execution. Your code should not utilize 100% of the compute. We recommend that you have a safety margin, such as a utilization under 80%, since compute utilization can vary.
CloudFront emits the compute utilization of the deployed function as a metric in Amazon CloudWatch. You can monitor this CloudWatch metric to find near real-time compute utilization of your function.
Testing CloudFront Functions code
Once you finished writing the code, you should test your code with a few test cases to make sure that the function works as intended. This kind of test is usually called a unit test. At lease one of the test cases will validate whether or not the function works correctly with the expected input, while other test cases will validate whether or not the function doesn’t break for corner cases. For CloudFront Functions, you must also confirm whether or not the function is within the allowed compute utilization range.
Test your code either with the console or API, and then prepare the context data of the HTTP request (and HTTP response, if the function is for the viewer response event). The CloudFront console provides a skeleton for the input data so that you can quickly test your function with a simulated context data.
You must provide the context data as a JSON object when you’re using API call, like the following shows:
As you can see, testing comes down to crafting the right input to meet your test target with the set of headers, cookies, query strings, and monitor result. There are couple of ways to make testing effective.
Test every code path if possible and automate the test
CloudFront Functions code is lightweight, thus for many of the cases you should be able to write test cases for every code execution path. Since there will be more than one test input, you may want to automate the testing process to load each test input and invoke API call. For example, you can run the following command to test a function with multiple test objects:
Test with real traffic data and monitor compute utilization
When you’re writing a new CloudFront Functions for existing CloudFront distribution, before you decide to deploy it in production, testing with real traffic data is beneficial. It could reveal corner cases that you haven’t anticipated when you’re making the unit test. For example, you may find that some clients have very different values for User-Agent header values, or that the header is missing. It also can test the compute utilization of the function. In fact, you should review the real traffic data before writing code to enhance the effectiveness of your code. Even if you’re writing a function for new CloudFront distribution that has yet to deploy, obtaining testable data from similar existing distribution is recommended.
Making the test based on production traffic
So how do you create test data based on the production traffic? You must collect access logs and process them into test inputs. There are numerous ways to process the access logs, and we’ll use Amazon Athena as an example in this post. Let’s assume that we’re writing a CloudFront function on viewer request, which will read incoming referer header and uri path. By using the following query, you can extract the most frequent combination of them.
You can also go one step further and make them into an automated test for your CloudFront functions code, like the following. Refer to the sample-code for details.
Consideration
When you use this approach, you may need to consider cost. Athena logs query costs based on the data scanned. Therefore, running the query to all of the existing logs isn’t recommended. Instead, you can load the logs into partition and limit the query to scan only part of the logs. Furthermore, you can use CloudFront real-time logs with low sampling rate, when your CloudFront distribution delivers a huge number of HTTP request and responses.
Moreover, CloudFront standard access logs have only limited number of headers. Therefore, you may need real-time access logs to use headers that aren’t in the standard access logs.
You may want to integrate the CloudFront Functions test in your code pipeline if you’re managing CloudFront distributions in your code along with other resources.
Conclusion
In this blog, you have learned why testing the compute utilization of CloudFront Functions is required and how to avoid over-utilizing it by testing with real traffic data.
To find out more, visit the documentation for CloudFront Functions, or review the example codes on github.
You can start writing code in AWS Management Console.