Using response streaming with AWS Lambda Web Adapter to optimize performance

This post is written by Harold Sun, Senior Serverless SSA, AWS GCR, Xue Jiaqing, Solutions Architect, AWS GCR, and Su Jie, Associate Solution Architect, AWS GCR.

AWS Lambda now supports Lambda response streaming, which introduces a new invocation mode accessible through the Lambda Function URLs. This feature enables Lambda functions to send response content in sequential chunks to the client. It is available for Lambda’s Node.js runtime, custom runtimes, and can be accessed using the InvokeWithResponseStream API in Lambda.

The Lambda Web Adapter, written in Rust, serves as a universal adapter for Lambda Runtime API and HTTP API. It allows developers to package familiar HTTP 1.1/1.0 web applications, such as Express.js, Next.js, Flask, SpringBoot, or Laravel, and deploy them on AWS Lambda. This replaces the need to modify the web application to accommodate Lambda’s input and output formats, reducing the complexity of adapting code to meet Lambda’s requirements.

When using other managed runtimes such as Java, Go, Python, or Ruby, developers can use the Lambda Web Adapter to build applications that support Lambda response streaming more easily.

Implementing response streaming with Lambda Web Adapter

In general, you can regard Lambda Web Adapter as an extension of Lambda, which is integrated into Lambda’s runtime environment using the Lambda Extension API. It operates within an independent process space when the Lambda function is invoked and serves as a custom runtime. When the function is run, the Web Adapter starts alongside the packaged web application.

After initialization, it performs a readiness check on the configured web application’s port every 10ms (the default is 8080, but you can configure other ports using environment variables). Once it receives an HTTP response with an “200 OK” status from the web application, it encapsulates the received Lambda invocation parameters according to the HTTP protocol and sends a request to the running web application.

Once the web application responds to this request, the Web Adapter formats the response content according to the function’s response format and sends it to the client, completing one invocation of the function.

Similarly, the Lambda Web Adapter uses the Custom Runtime API to implement response streaming. When implementing a function using response streaming:

The Web Adapter sends a POST request to the Lambda Runtime’s Response API, including the Lambda-Runtime-Function-Response-Mode HTTP header with the value streaming and the Transfer-Encoding HTTP header with the value chunked:
```
POST http://${AWS_LAMBDA_RUNTIME_API}/runtime/invocation/${AwsRequestId}/response
Lambda-Runtime-Function-Response-Mode: streaming
Transfer-Encoding: chunked
```
It encodes the response data according to the HTTP/1.1 Chunked Transfer Encoding protocol specification and sends it as the “Body” to the Lambda Runtime’s Response API.
After assembling the response and completing the data transmission, the Web Adapter closes the underlying network connection.

Under normal circumstances, completing these steps enables Lambda response streaming in a function. However, this is not sufficient for web application scenarios. Web applications must often send custom HTTP response status codes, custom HTTP headers, and some cookie data to the client. The previous steps only achieve streaming of the response body, and cannot add content to the response’s HTTP headers.

To add these, when sending the response content to the Response API, you must:

Add a Content-Type HTTP Header to specify the MIME type (original media type) of the response as application/vnd.awslambda.http-integration-response.
Send the custom response headers, such as HTTP status code, customer headers, and cookies, in JSON format.
Send 8 NULL characters as separators.
Send the response content encoded using the HTTP 1.1 Chunked Transfer Encoding protocol.

Here is an example of the response format:

POST http://${AWS_LAMBDA_RUNTIME_API}/runtime/invocation/${AwsRequestId}/response
Lambda-Runtime-Function-Response-Mode: streaming
Transfer-Encoding: chunked
Content-Type: application/vnd.awslambda.http-integration-response
{
    "statusCode":200,
    "headers":{
        "Content-Type":"text/html",
        "custom-header":"outer space"
    },
    "cookies":[
        "language=xxx",
        "theme=abc"
    ]
}
8 NULL characters
Chunked response body

In Lambda Function URLs, multi-value HTTP headers are not supported. As a result, you cannot implement responses with multi-value HTTP headers in the Lambda Web Adapter.

Using response streaming with Lambda Web Adapter

When packaging Lambda functions using the zip format, you must attach the Lambda Web Adapter as a layer and configure the environment variable AWS_LAMBDA_EXEC_WRAPPER with the value /opt/bootstrap.

After that, you can configure the startup script of the web application as the Lambda function’s handler. By doing this, the function is able to use the Lambda Web Adapter, and the web application can be launched and run within the Lambda runtime environment.

The AWS_LAMBDA_EXEC_WRAPPER environment variable points to the bootstrap script provided by the Web Adapter to ensure the proper execution of the web application.

When using a Docker Image or OCI Image to package the Lambda function, you only need to include the Lambda Web Adapter binary package in the Dockerfile by copying it to the /opt/extensions directory within the image. Additionally, you should specify the port on which the web application listens by setting the PORT environment variable:

COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.7.0 /lambda-adapter /opt/extensions/lambda-adapter

ENV PORT=3000

By default, the Web Adapter is invoked using the buffered mode. To use response streaming as the invocation mode in the function, you must configure an environment variable. Specify the function’s Web Adapter invocation mode as response_stream:

ENV AWS_LWA_INVOKE_MODE=response_stream

Due to the different data formats between the buffered and response stream invocation modes, you must configure the AWS_LWA_INVOKE_MODE to have the same behavior as the InvokeMode specified in the Lambda Function URLs. Otherwise, the client may not process the response content correctly.

Lambda response streaming example

Server-side rendering (SSR) can accelerate the loading time of a React application. With SSR, the the server generates the HTML pages and sends them to the client, which renders the content. The browser executes the hydration process, which “wakes up” the static components from the received HTML and mounts them into the React application. This allows for a faster response to user interactions and improves the overall user experience.

By using Lambda response streaming, your application can achieve a faster TTFB by processing response content in sequential chunks. This helps to reduce the time it takes for the initial data to be sent from the server to the client, enhancing overall performance.

The hydration process can introduce delays as the client-side JavaScript must re-render and rehydrate the page after the initial load. Lambda response streaming minimizes the need for full page hydration, leading to an improved user experience.

Next.js 13’s support for streaming with suspense complements the Lambda response streaming feature, allowing you to use both SSR and selective hydration. This combination can lead to greater improvements in performance and user experience for your Next.js applications.

This GitHub repo demonstrates a Next.js application that supports Lambda response streaming using the Web Adapter and the streaming with suspense feature. Use AWS Serverless Application Model (AWS SAM) to deploy the application to test these optimizations:

git clone git@github.com:aws-samples/lwa-nextjs-response-streaming-example.git
cd lwa-nextjs-response-streaming-example.git

sam build
sam deploy -g --stack-name lambda-web-adapter-nextjs-response-streaming-example

After the sam deploy process is completed, you can access the Lambda Function URLs endpoint provided in the output. Here is the output of the Lambda response streaming Next.js application demo:

Example output

Quotas and pricing

Web Adapter is an enhancement to Lambda and does not incur additional costs. You are only charged for the Lambda function usage based on the resources consumed.

However, response streaming may result in additional network costs. You are billed for any part of the response that exceeds 6MB. For more information, refer to the pricing page.

There is a maximum response size limit of 20MB for Lambda response streaming. This is a soft limit, and you can request to increase this limit by creating a support ticket.

The response speed of Lambda response streaming depends on the size of the response body. The transfer rate for the first 6MB is not limited, but any part of the response beyond 6MB has a maximum throughput of 2MB/s. For more detailed information, refer to Bandwidth limits for response streaming.

Conclusion

Lambda response streaming can improve the TTFB for web pages. With the support of AWS Lambda Web Adapter, developers can more easily package web applications that support Lambda response streaming, enhancing the user experience and performance metrics of their web applications.

For more serverless learning resources, visit Serverless Land.

AWS Compute Blog