CloudFront 部署小指南（十八）：通过 Amazon CloudFront 边缘计算实现自定义条件缓存键设置

需求概述

在 CloudFront 的部署中，缓存键（cache key）的设定是一个常见且重要的配置项。通常，我们会通过配置 CloudFront Cache Policy 来定义缓存键的组成，而该策略会与特定的 CloudFront 行为（Behavior） 相关联生效。

CloudFront 行为通常以访问路径（URI）或文件类型后缀（如 *.jpg、*.html）进行区分，这意味着我们可以针对不同的路径或资源类型设定不同的缓存策略。也就是说，CloudFront 支持“在特定访问条件下设定缓存键”的能力。

那么，如果我们希望根据更复杂的“自定义条件”来动态设置缓存键，应该如何实现呢？比如：

当 HTTP 请求头 Header A 包含特定值 Value B（或查询参数 Query String Key C 包含 Value D）时，我们希望将 Header A（或 Query String Key C）纳入缓存键的一部分。一个典型的场景是：当请求头中包含 CustomerType=VIP 时，为该类请求设置专属的缓存键，以便为 VIP 用户提供个性化内容，并通过缓存提升访问体验。
再比如，当请求中包含某个特殊的 Header E（或查询参数 Query String Key F）时，希望跳过缓存，强制回源以获取最新内容（即不使用缓存键）。一个常见的例子是：当请求头中带有 MustToOrigin 字段时，表示该请求必须访问源站，绕过缓存机制以确保内容的时效性。

这些灵活的场景，超出了默认 Cache Policy 的能力范围。本文将介绍我们是如何通过 Amazon CloudFront 边缘计算（CloudFront Function 和 Lambda Edge）来实现上述“基于自定义条件设定缓存键”的需求的。

方案实现原理

缓存键的原理，是通过组合 HTTP 请求中指定字段的值来构建唯一的缓存空间（Cache Namespace）。后续的请求会依据这些字段值的组合，匹配并命中相应的缓存内容。

在 CloudFront 中，我们可以借助 CloudFront Function 将请求中的特定字段（如 URI、Header、Query String、Cookie 等）提取并组合，映射到一个自定义的 Custom Header 中；随后，通过配置 Cache Policy，将该自定义 Header 指定为缓存键；借助 CloudFront Function 的条件逻辑，可以实现“按需插入”字段，灵活地控制缓存键的构建。如果涉及多个条件，Cache Policy 中会引入多个 Custom Header 来区分不同的缓存键；由于 Cache Policy 原生只能支持 1 组 TTL 缓存时间，我们可以通过 Lambda Edge 来为每个通过 Custom Header 定义的缓存键定义 TTL 缓存时间。

第一步：创建 CloudFront Function，实现条件逻辑并写入自定义 Header，然后关联到 CloudFront behavior – Function Association –viewer-request 处，即在 CloudFront 收到客户端请求时执行：

if (<条件一>) {
    http.request.headers['custom_header_1'] = {
        value: <缓存键_1>.value + <缓存键_2>.value
    };
    return request;
}

if (<条件二>) {
    http.request.headers['custom_header_2'] = {
        value: <自定义value>
    };
    return request;
}

注：此处以伪代码为例，真实代码中可根据具体逻辑拼接多个字段值。

第二步：创建 CloudFront Cache Policy，并包含以下配置

Cache Key 包含字段：custom_header_1、custom_header_2 …
TTL 设置：
1. 最小 TTL（Min TTL）：0
2. 最大 TTL（Max TTL）：无限制
3. 默认 TTL（Default TTL）：86400 秒（即 1 天）

第三步：创建 CloudFront Lambda@Edge，根据 Custom Header（自定义缓存键）动态设置缓存时间（cache-control）响应 Header，然后关联到 CloudFront behavior – Function Association – Origin-Response 处，即在 CloudFront 收到源站时执行：

If http.request.header['custom_header_1'] then
    http.resoponse.header['cache-control'] = '<cache time>'
    return
    
If http.request.header['custom_header_2'] then
    http.resoponse.header['cache-control'] = '<cache time>'
    return

方案部署

1. CloudFront Cache Policy 设置

Custom Headers 长度限制：通过 CloudFront 边缘计算（CloudFront Function）增加的 Custom Headers 和原 Headers 长度总和不能超过 8KB 字节。

2. 部署CloudFront Function 代码

在 CloudFront > Distribution > Default Behavior（也可以指定 Behavior）> Function associations > Viewer Request 中，关联 CloudFront function 代码，以下是示例代码：

function handler(event) {
    var request = event.request;
    var headers = request.headers;
    var querystring = request.querystring;
    var cookies = request.cookies;

    // First condition: If header['CustomerType'] === 'VIP'
    if (
        headers['customertype'] && headers['customertype'].value === 'vip'
    ) {
        var queryA = querystring['A'] ? querystring['A'].value : '';
        var cookieB = cookies['B'] ? cookies['B'].value : '';

        // Set custom_header_1 to queryA + cookieB
        request.headers['custom_header_1'] = {
            value: queryA + cookieB
        };

        return request;
    }
    
    // second condition: If header['musttoorigin'] === True
    if (headers['musttoorigin']) {
        
        request.headers['custom_header_2'] = {
            value: 'MustToOrigin'
        };

        return request;
    }

    return request;
}

3. CloudFront Lambda Edge 代码

在 CloudFront > Distribution > Default Behavior（也可以指定 Behavior）> Function associations > Origin Response 中，关联 Lambda Edge 代码，以下是示例代码：

import json

def lambda_handler(event, context):
    request = event['Records'][0]['cf']['request']
    response = event['Records'][0]['cf']['response']
    headers = request['headers']

    # Check if 'custom_header_1' is present in the request headers
    if 'custom_header_1' in headers:
        response['headers']['cache-control'] = [{
            'key': 'Cache-Control',
            'value': 'max-age=3600'
        }]
        return response

    # Check if 'custom_header_2' is present in the request headers
    if 'custom_header_2' in headers:
        response['headers']['cache-control'] = [{
            'key': 'Cache-Control',
            'value': 'no-cache'
        }]
        return response

    # No matching headers, return the original response
    return response

方案测试

测试场景一：为 VIP 用户个性化缓存

在这个测试场景中，我们赋予 HTTP 请求中包含了“CustomerType=VIP” header 的请求分配自定义 cache key 的缓存空间，使得 VIP 用户可以通过 CloudFront 缓存获取个性化的资源。

通过 CURL 命令发起 HTTP request 请求：包含“CustomerType=VIP” header。返回了特定内容并且被缓存。

curl -H "CustomerType: vip" -H "Cookie: B=b" "https://cf.samye.world/custom_cache_key?A=a" -v

第一次：

第二次：

通过 CURL 命令发起 HTTP request 请求：包含“CustomerType=common” header。返回了通用内容。

curl -H "CustomerType: common" -H "Cookie: B=b" "https://cf.samye.world/custom_cache_key?A=a" -v

测试场景二：强制回源

在这个测试场景中，我们规定了 HTTP header 中包含了“MustToOrigin” header 的请求直接到源站获取实时内容。

通过 CURL 命令发起 HTTP request 请求：包含“MustToOrigin” header。每次请求响应的结果不被缓存。

curl -H "musttoorigin: yes" "https://cf.samye.world/custom_cache_key" -v

第一次：

第二次：

总结

Amazon CloudFront 边缘计算提供了一种灵活的方法去设置 CDN：包括缓存键 / HTTP 标头 / 源站信息的修改，这些设置都是动态来完成的。

该解决方案代码示例参阅：https://github.com/aws-samples/sample-CloudFront-usecase/tree/main/src/Customized_CacheKey。

CloudFront 边缘计算其他的典型应用请参考：https://github.com/aws-samples/amazon-cloudfront-functions。

选择您的 Cookie 首选项

亚马逊AWS官方博客