Writing and testing CloudFront Functions with production traffic

While maintaining a web application, sometimes we need to build a simple logic that must run in low latency. For example, you may want to set up website redirection based on condition, or quickly verify an incoming header. CloudFront Functions is ideal for these use cases since it lets you write lightweight JavaScript code that can add or modify response headers, redirect requests to a newer URL, or respond with a custom response.

This scripting-based approach provides high flexibility, but it also comes with the responsibility of writing effective code and testing it correctly. By following best practices while developing your code, you can make sure sure that you use this edge computing feature most efficiently and minimize the risk of run-time error.

In this post, we’ll share some considerations on writing and testing CloudFront Functions codes, and test the function with real HTTP traffic patterns.

CloudFront Functions characteristics

CloudFront Functions is secure, fast, and cost-effective edge computing suitable for manipulating simple HTTP requests or responses. CloudFront Functions execution is triggered by Amazon CloudFront, on viewer-request or viewer-response event.

What does the CloudFront Functions code look like? The following is one example code (you can find more examples in this github repository):

function handler(event) {
    var request = event.request;
    var headers = request.headers;
    var host = request.headers.host.value;
    var country = 'DE' // Choose a country code
    var newurl = `https://${host}/de/index.html` // Change the redirect URL to your choice 
  
    if (headers['cloudfront-viewer-country']) {
        var countryCode = headers['cloudfront-viewer-country'].value;
        if (countryCode === country) {
            var response = {
                statusCode: 302,
                statusDescription: 'Found',
                headers:
                    { "location": { "value": newurl } }
                }

            return response;
        }
    }
    return request;
}

As you can see, it’s JavaScript code that includes a mandatory function handler(event). Once the function is associated with CloudFront, CloudFront will run the handler function as the entry point. The event object contains the HTTP request metadata, or response metadata, depending on the event trigger. The handler function must return an HTTP request or response object (only HTTP responses are allowed in case of viewer-response event) so that CloudFront can continue processing with the returned object. Note that you can only render headers of the response object with CloudFront Functions, whereas Lambda@Edge can render response body.

CloudFront Functions returns request event object to continue the flow, or returns response to change the flow and respond to viewers.

Figure 1 CloudFront Functions must return HTTP request or response object.

CloudFront Functions is also designed to run within limited CPU time to minimize latency and operate at the largest scale. Each function execution has limited computing resources expressed in percentages as a compute utilization metric. A CloudFront function might fail if the function uses computing resources over the allowed amount. This will lead CloudFront to serve an error response to the viewer. The CloudFront console and TestFunction API provide the compute utilization as part of the test result, showing what percentage of the maximum allowed computing resource was utilized during the test execution. Your code should not utilize 100% of the compute. We recommend that you have a safety margin, such as a utilization under 80%, since compute utilization can vary.

Figure 2 CloudFront console shows compute utilization of a function.

CloudFront emits the compute utilization of the deployed function as a metric in Amazon CloudWatch. You can monitor this CloudWatch metric to find near real-time compute utilization of your function.

Figure 3 Example of CloudWatch metric showing the compute utilization of CloudFront Functions.

Testing CloudFront Functions code

Once you finished writing the code, you should test your code with a few test cases to make sure that the function works as intended. This kind of test is usually called a unit test. At lease one of the test cases will validate whether or not the function works correctly with the expected input, while other test cases will validate whether or not the function doesn’t break for corner cases. For CloudFront Functions, you must also confirm whether or not the function is within the allowed compute utilization range.

Test your code either with the console or API, and then prepare the context data of the HTTP request (and HTTP response, if the function is for the viewer response event). The CloudFront console provides a skeleton for the input data so that you can quickly test your function with a simulated context data.

Figure 4 Testing a function with CloudFront console.

You must provide the context data as a JSON object when you’re using API call, like the following shows:

{
    "version": "1.0",
    "context": {
        "eventType": "viewer-request"
    },
    "viewer": {
        "ip": "198.51.1.1"
    },
    "request": {
        "method": "GET",
        "uri": "/example.png",
        "headers": {
            "host": {"value": "example.org"}
        }
    }
}

As you can see, testing comes down to crafting the right input to meet your test target with the set of headers, cookies, query strings, and monitor result. There are couple of ways to make testing effective.

Test every code path if possible and automate the test

CloudFront Functions code is lightweight, thus for many of the cases you should be able to write test cases for every code execution path. Since there will be more than one test input, you may want to automate the testing process to load each test input and invoke API call. For example, you can run the following command to test a function with multiple test objects:

for f in *.json; do aws cloudfront test-function \
 --name ExampleFunction \
 --if-match ETVABCEXAMPLE \
 --event-object fileb://$f \
 --stage DEVELOPMENT; \
done

Test with real traffic data and monitor compute utilization

When you’re writing a new CloudFront Functions for existing CloudFront distribution, before you decide to deploy it in production, testing with real traffic data is beneficial. It could reveal corner cases that you haven’t anticipated when you’re making the unit test. For example, you may find that some clients have very different values for User-Agent header values, or that the header is missing. It also can test the compute utilization of the function. In fact, you should review the real traffic data before writing code to enhance the effectiveness of your code. Even if you’re writing a function for new CloudFront distribution that has yet to deploy, obtaining testable data from similar existing distribution is recommended.

Making the test based on production traffic

So how do you create test data based on the production traffic? You must collect access logs and process them into test inputs. There are numerous ways to process the access logs, and we’ll use Amazon Athena as an example in this post. Let’s assume that we’re writing a CloudFront function on viewer request, which will read incoming referer header and uri path. By using the following query, you can extract the most frequent combination of them.

SELECT
count(*) cnt, uri, referrer
-- please change the table name to yours
FROM combined
-- filtering 24 hours(1d) data
WHERE concat(year, month, day, hour) >= DATE_FORMAT
GROUP BY uri, referrer 
ORDER BY cnt desc
-- top 10 requests
limit 10

You can also go one step further and make them into an automated test for your CloudFront functions code, like the following. Refer to the sample-code for details.

python3 testingCFF.py --function CORS-Preflight
input(url, referer) --> output(status(Err or OK), ComputeUtilization%)
input(/images/sample.jpg, https://example.com/) --> output(OK, 11%)
input(/images/sample2.jpg, https://example.com/) --> output(OK, 15%)
input(/images/sample2.jpg, -) --> output(OK, 14%)
……………

Consideration

When you use this approach, you may need to consider cost. Athena logs query costs based on the data scanned. Therefore, running the query to all of the existing logs isn’t recommended. Instead, you can load the logs into partition and limit the query to scan only part of the logs. Furthermore, you can use CloudFront real-time logs with low sampling rate, when your CloudFront distribution delivers a huge number of HTTP request and responses.

Moreover, CloudFront standard access logs have only limited number of headers. Therefore, you may need real-time access logs to use headers that aren’t in the standard access logs.

You may want to integrate the CloudFront Functions test in your code pipeline if you’re managing CloudFront distributions in your code along with other resources.

Conclusion

In this blog, you have learned why testing the compute utilization of CloudFront Functions is required and how to avoid over-utilizing it by testing with real traffic data.

To find out more, visit the documentation for CloudFront Functions, or review the example codes on github.

You can start writing code in AWS Management Console.

Julian Ju

Julian is an Edge services specialist Solutions Architect at AWS in ASEAN region. He’s been working over 25 years in the IT industry as a developer, tech project manager, consultant, and architect. At AWS, he helps customers to adopt AWS Edge services for safer and faster internet experiences.

Yujin Jeong

Yujin Jeong is a Solutions Architect at AWS based in Korea. She has a passion for helping enterprise customers navigate their cloud journey and finding the right solutions to meet their unique business needs. With her technical expertise and experience, she has helped customers achieve their cloud goals and drive business outcomes through innovative and scalable cloud solutions.

Networking & Content Delivery