Containers

AWS Lambda for the containers developer

Introduction

When building an application on AWS, one of the common decision points customers encounter is building on AWS Lambda versus building on a containers product like Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). To make this decision, there are many factors to consider such as cost, scaling properties, and the amount of control the developer has over hardware options. Neither the function model nor the service-based model is objectively better or worse. Rather, it is a matter of fit between the application and the underlying product. But one of the more befuddling dimensions of this choice is the difference in the programming model between the function-centric paradigm of AWS Lambda and the traditional, service-based paradigm of Amazon ECS or Amazon EKS.

The difference in programming model between AWS Lambda and either Amazon ECS or Amazon EKS is often discussed. But what do we mean by programming model? The programming model of a product has two aspects. The first is the manner in which the caller issues its request against an application. The second is the manner in which the code inside the application receives a request from the service and provides the corresponding response. In this post, we’ll discuss the former but focus on the latter. We peek under the hood of an AWS Lambda application and seek to understand the inner mechanics with which an application running in AWS Lambda interacts with the AWS Lambda service to receive and respond to requests.

Our goal in this post is twofold. First, we hope to demystify the AWS Lambda programming model and show how much of the “Lambda magic” is really a straightforward contract between the application and the service. Second, we hope to show that, for folks coming from a traditional container background, AWS Lambda really isn’t that different. All compute products establish some contract between application code and the service. Moving applications between compute products is really about — hopefully small — changes to the application such that it adheres to the programming model of the product.

Walkthrough

Let’s get started!

As we all know, AWS Lambda runs on servers (!) and accepts application code either via ZIP packaging or Open Container Initiative (OCI) packaging. While we could use ZIP packaging to do the same things (more on this towards the end), in this post we configure our AWS Lambda with a container image. As far as the workload is concerned, we’re going to build with one of the simplest programming languages available: a bash script. We really want to get as close to those servers as we possibly can to demonstrate the interaction between the code running in the container and the programming model of the AWS Lambda service.

To start, we will use this simple Dockerfile:

FROM public.ecr.aws/amazonlinux/amazonlinux:2023
RUN yum install -y jq tar gzip git unzip
RUN curl -Ls "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" \
    && unzip awscliv2.zip \
    && ./aws/install
ADD startup.sh /startup.sh
ADD businesscode.sh /businesscode.sh
ENTRYPOINT /startup.sh

If you ever thought about AWS Lambda as something esoteric, think again. This is a standard Dockerfile that starts FROM a stock Amazon Linux 2023 image and installs a set of tools in it (the AWS Command Line Interface [AWS CLI], tar, git, etc.) Yes, AWS Lambda runs this container image and the startup.sh script just like you could run it on your laptop (or on AWS Fargate).

There are three dimensions that make a container special in AWS Lambda:

  • The constraints of the container instance
  • What launches the container instance
  • What we run in the startup.sh script (and in the businesscode.sh script)

Let’s examine them individually.

The constraints of the container

The machine or virtual machine surrounding a container will dictate its capabilities. If you launch a container on your laptop you likely won’t have a Graphics Processing Unit (GPU) at your disposal. If you launch a container using AWS Fargate you won’t be able to run privileged containers. Every execution environment has its constraints. The AWS Lambda execution environment has its own:

  • its execution life span is (artificially) limited
  • its size is configured via a memory parameter and CPU capacity is proportionally allocated
  • the container runs with a read-only root filesystem (/tmp is the only writable path)
  • you can’t run privileged containers
  • you can’t expose a GPU for your container’s use

A lot of these constraints are common in traditional container managed services and/or local executions. The lifespan constraint and the read-only filesystem constraint are the most relevant for this post and we’ll come back to them later.

What launches the container

In this section we’ll discuss the first aspect of a service’s programming model – how the caller calls the application. Every environment has its own way of orchestrating the launch of containers. If you want to launch a container on your laptop, then you’ll probably use either a docker run or a finch run. If you want to launch a container on AWS Fargate, then you’ll probably use an Amazon ECS API, such as runTask or a createService. AWS Lambda, at its very core, is an event-driven system and everything (including the launch of the container above) happens because of events. AWS Lambda supports hundreds of different events coming from many different AWS services. A classic event could be a message in an Amazon SQS queue as part of an asynchronous application. But an event could also be an AWS API Gateway (or AWS Elastic Load Balancing) HTTP call as part of an interactive web application. One way or another, the event is made available to AWS Lambda for processing (more on this mechanism later). A single AWS Lambda container processes at most one event at a time. However, it may process many events sequentially for the duration of its lifetime.

AWS Lambda container orchestration follows roughly this flow in response to an incoming event:

  • If there is a container initialized and idle to execute an event, then AWS Lambda assigns the event to that container and executes on it
  • If there are no containers initialized and idle to execute an event, then AWS Lambda launches a new container
    • AWS Lambda may choose to keep this container around longer than the single execution so that future events need not spin up a new container
    • If multiple events come in concurrently, then AWS Lambda launches container instances in parallel for each, up to the configured function or account concurrency and burst limits

What we run in the startup.sh script

So far we have touched on the execution environment of an AWS Lambda container (its constraints) and the lifecycle of this execution environment (the orchestration). Now we consider what the code running inside the container actually does (the programming model).

You may have heard about the Lambda runtime APIs. The easiest way to think about these APIs is that they offer the application a way to get an event and to respond to an event. Think of your container instance as a long-running process that repeatedly checks if there is an event to process and if there is, it does something, and then tells AWS Lambda the results of that work.

With this high-level mental model in mind, we write a startup.sh script that implements the flow above. In our example, our business need is for our AWS Lambda to clone a GitHub repository with a Hugo website, build it into a set of JavaScript artifacts, and copy the results into an Amazon Simple Storage Service (Amazon S3) bucket. Due to our lack of imagination, we have captured this business logic into a script called businesscode.sh. The startup.sh script calls into businesscode.sh, acting as a bridge between the AWS Lambda programming model and our business logic. The business logic doesn’t need to know anything about AWS Lambda.

Important: the use case itself is not important, focus on the flow and the mechanics of how code runs rather than the actual commands and what they do.

This is the body of startup.sh:

#!/bin/sh
set -euo pipefail

###############################################################
# The container initializes before processing the invocations #
###############################################################

echo Installing the latest version of Hugo...
cd /tmp
export LATESTHUGOBINARYURL=$(curl -s https://api.github.com/repos/gohugoio/hugo/releases/latest | jq -r '.assets[].browser_download_url' | grep Linux-64bit.tar.gz | grep extended)
export LATESTHUGOBINARYARTIFACT=${LATESTHUGOBINARYURL##*/}
curl -LO $LATESTHUGOBINARYURL
tar -zxvf $LATESTHUGOBINARYARTIFACT
./hugo version

###############################################
# Processing the invocations in the container #
###############################################

while true
do
  # Create a temporary file
  HEADERS="$(mktemp)"
  # Get an event. The HTTP request will block until one is received
  EVENT_DATA=$(curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
  # Extract request ID by scraping response headers received above
  REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)

  ############################
  # Run my arbitrary program #
  ############################

  /businesscode.sh

  ############################

  # Send the response
  curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response"  -d '{"statusCode": 200}'

done

This is the body of businesscode.sh:

#!/bin/sh
set -euo pipefail

rm -rf hugo-sample-site

git clone https://github.com/${AWS_LAMBDA_FOR_THE_CONTAINERS_DEVELOPER_BLOG_GITHUB_USERNAME}/aws-lambda-for-the-containers-developer-blog
cd aws-lambda-for-the-containers-developer-blog/hugo_web_site
/tmp/hugo
aws s3 cp ./public/ s3://${AWS_LAMBDA_FOR_THE_CONTAINERS_DEVELOPER_BLOG_BUCKET}/ --recursive

The startup.sh script starts with a section that runs only when the container launches. This part of the script (init) is what determines the cold start of the AWS Lambda container instance. In our example, it downloads the latest version of the hugo binary at runtime. We could have added the setup of this binary to the Dockerfile shown previously, but that would have forced us to rebuild the image every time we want the latest version of the file. Here, we leverage the fact that AWS Lambda runs initialization code on every container launch to our advantage, bringing in the latest version of Hugo dynamically. Your specific use case dictates whether a piece of code should live in the Dockerfile, in the init phase, or in your business logic.

Note that we have to operate within the /tmp folder because that is the only writable folder inside the Lambda container. For this reason, it was easier to install some of the tools in our Dockerfile.

The next section of the script (labeled “Processing the invocations in the container”) is where the code enters an infinite loop that lasts for the duration of the container’s lifespan. The code continuously checks (via a curl against the local AWS Lambda runtime API endpoint) if there is an event to process. And this is where the AWS Lambda magic is — it exposes and maintains the runtime API endpoint within each execution environment. It passes events back on that polled endpoint as they come in and if there isn’t an event waiting, then AWS Lambda pauses the execution environment until an event arrives. On receiving each event, our code grabs it and continues to run the next part of the script (labeled “Run my arbitrary program”) on that event. This is the AWS Lambda-agnostic part of the script and where we execute our business logic (the businesscode.sh script). This part is subject to the AWS Lambda execution timeout (configurable up to 15 minutes). This means that the code runs as part of the “Run my arbitrary program” section can’t run for more than the configured timeout.

Once the business logic has completed, the script returns a message via an HTTP POST to the same endpoint to inform the AWS Lambda service that event processing has completed. Note that AWS Lambda doesn’t care what you return as long as you return something. In our script we return {“statusCode”: 200}, because we are using an API Gateway to trigger this function and API Gateway expects that code in return. One could have instead returned text like “hey I am done” and AWS Lambda would have been fine with that (less so the API Gateway).

Don’t confuse the container execution lifespan with the AWS Lambda timeout. The former defines how long a container continues to run the loop after its launch. This lifespan is not part of the AWS Lambda “contract” and a developer shouldn’t assume how long the container will be up and running until it gets shut down. The AWS Lambda timeout is indeed part of the contract and, as of the time of this writing, is configurable up to a maximum of 15 minutes. If upon receiving an event from the runtime API the container code takes more than the configured timeout to post its response, the request will be returned to the caller as a timeout and the container will be restarted.

The other thing worth noting in this script is that we ignore the payload of the event, which carries the actual event information. In other words, we’re only interested in the event trigger and not what the trigger brings with it. We parse the HEADERS to extract the request ID that we use at the end of the loop to inform the AWS Lambda service we have processed the event. In a more classic event-driven architecture, we would have parsed the payload and used it to influence what our business logic does with the request.

This diagram is a visual representation of the sections of the code above:

Let’s put everything together

We’re now going to describe at the high level, what happens under the hood when you deploy this AWS Lambda and you execute it.

You build a container image from the Dockerfile above and create an AWS Lambda function with the image. You then configure two triggers for this AWS Lambda: an API Gateway endpoint and an Amazon SQS queue. At this stage, nothing is running and no container has been launched.

Now you hit the API Gateway endpoint with a request from a terminal (curl <api gateway endpoint>). API Gateway translates that HTTP request into an AWS Lambda event and it launches a container in response to the event. The container goes through its initialization phase (grabbing the hugo binary in our case). It then enters its event loop and grabs the event that AWS Lambda is holding and waits for an available container. The container spends a few seconds cloning the repository and builds the site and copying its content to Amazon S3. Once done, the container informs the AWS Lambda runtime that the code has finished running via an HTTP POST. AWS Lambda notifies the API Gateway synchronously and the terminal sees the prompt back (there won’t be anything in response because we are not returning a body in the message of our HTTP POST in the script).

Note that this process is going to take roughly 30 seconds because, in our use case, we are using the Lambda as some kind of build system. This may not be how you would use AWS Lambda in a traditional synchronous request/response pattern. If you were, the “business logic” would probably be leaner: think a web service responding in milliseconds. Again, this use case is only used illustratively to show you what happens inside the mechanics of a Lambda instance.

At this point, the container has called back into the runtime API for its next event and is awaiting the runtime’s response. The container is now paused until another event comes in, during which time you are not paying for it. If you now drop a message into the queue, AWS Lambda knows that there is an active and idle execution environment, unpauses the container and routes the event to it. The container receives the event as a response to its runtime API call and goes through the same process of running business logic and responding back with the results.

In this case the event payload will be different from the event generated by the API Gateway but, for our use case, we don’t care because we don’t even read the event passed into the container. We only care about the trigger and not the event content itself.

After some time with no incoming requests, AWS Lambda shuts down the container above and there are no containers running behind the function. At this point, you hit the API Gateway endpoint with 100 simultaneous requests. AWS Lambda sees the 100 requests coming in and launches 100 containers in parallel to process those requests (i.e., all going through a small initialization “cold-start”). Once the requests have been processed and the site has been built and copied 100 times, the 100 containers continue to run for an indefinite amount of time ready to grab more events via their running loop (until AWS Lambda decides again to shut them off).

Running the container image outside of Lambda

If you are following along, you may have noticed that the Dockerfile we have used is no stranger to traditional Dockerfile’s you see in the wild. The biggest peculiarity is in how the startup.sh script initializes the container and in how it interacts with the AWS Lambda APIs (both for grabbing the event and posting the results inside the loop). This part is extremely specific to the AWS Lambda programming model. Having that said, we built these scripts such that the business logic (businesscode.sh) is separated from the programming model (startup.sh). Because of this, it would be easy to use the same container image and run it somewhere else by bypassing the AWS Lambda specifics and launching the business logic directly. An easy way to accomplish this is by running it locally with this Docker command:

docker run -v /tmp:/tmp --rm --entrypoint /businesscode.sh <container_image:tag> 

We only had to tweak our entrypoint and point to the business.sh code script.

The astute reader may have spotted we are mapping a local folder into the /tmp folder of the container and are wondering why. Because we have bypassed the initialization phase the container image does not install the hugo binary at startup. Instead, we are passing it dynamically from an existing binary we have in /tmp of our laptop. In a real life scenario you may build the Hugo binary into the container image or split the download code out of startup.sh so that it can be run outside of the AWS Lambda context. Again, this example is for demonstrative purposes and translating it to the real world depends on your use case.

But wait, this is not the AWS Lambda we know and love!

Right. As promised, this was a tour into the low-level mechanics of AWS Lambda where the execution model meets the programming model. If you know or have used AWS Lambda already, you have been abstracted away from all these details. It’s interesting to note how AWS Lambda started at the highest end of these abstractions when it launched and slowly introduced support to get full visibility into what we have discussed in this blog post. How do we reconcile what we have described here with the high-level abstractions you know and hear about? Let’s build up from what we described in this post, all the way to the AWS Lambda that you know.

Most developers have no desire to deal with loops and HTTP gets and posts while they are writing their business code. This is where the abstractions and conventions you often see in Lambda come in – the AWS Lambda Runtime Interface Client (RIC). The RIC is a utility (binary or library) provided by AWS for specific programming languages that implements the loop that intercepts events. The way these events flow into your code is via objects passed to a program function. The RIC grabs the HEADERS and the BODY mentioned above. It parses the event content and the context of the execution environment and passes these as objects into your function. In other words, the container launches with the RIC as the main program and, at every event, the RIC calls a function with event information. Following this convention, the developer finds the event directly inside the function without having to call an endpoint or parsing HEADERS and BODY.

We have effectively built in a bash script (startup.sh) part of the logic that the RIC implements. Note that we did not want to mimic a “function” convention in our example because we wanted to err more on the side of “this is a regular container with some peculiarities” than the side of “this is how you can re-implement a RIC in bash”. On this note, this tutorial in the Lambda documentation (which inspired this post) does exactly that and shows how you can build a bash function that you import into your main script!

Yes, despite AWS Lambda being Function as a Service (FaaS), the whole notion of a function in the context of AWS Lambda is just a convention we have built on top of a loop and two curl operations in a container that we abstract away for a clean developer experience.

Back to the topic of the RIC, we ship both the RIC standalone (for selected languages) if you want to build your own container image, or we provide AWS Lambda managed base images (that include the RIC and more) that you can use to build upon. Regardless of what you pick, when you use a container image you are responsible for its maintenance. In other words, you need to take care of deploying up-to-date images for your function.

An alternative mechanism, and a higher level of abstraction, is to package a custom runtime and business logic as a ZIP file and let AWS manage the operating system your function runs on.

For the ultimate level of abstraction and managed experience, you can package only your business logic as a ZIP file and let AWS curate and manage the entire runtime on your behalf. As discussed previously, this is where AWS Lambda started and there’s been a long process of adding more flexibility and control into the stack. We started adding support for layers and eventually we added support for container images.

Over the years, the Lambda community has built additional abstractions over the Lambda programming model described above. One such abstraction is the Lambda Web Adapter that allows customers to run traditional web applications Lambda. You can think of this Web Adapter as a custom runtime that interfaces between the Lambda programming model and a traditional web application framework that listens on a port. Using this model, the event-driven nature of Lambda is abstracted away, virtually decoupling the infrastructure from the programming model.

Test this prototype yourself

For the curious minds that want to get their hands dirty, we have created a GitHub repository with all the code and setup instructions required to re-create this prototype. Please visit this link if you want to deploy this exercise yourself.

Conclusion

In this post, we described AWS Lambda from a different perspective than usual. While the specific example and use case we have used isn’t conventional and may not map to a real-life AWS Lambda scenario, we hope this post helped customers appreciate more about the inner working of the service. We also hope we provided additional clarity on the differences between AWS Lambda and traditional containers systems. Working out the differences isn’t as exotic as it may feel at first.