Containers

Windows Containers on AWS Fargate: Launch time improvements

We launched AWS Fargate support for Windows Server containers on Amazon Elastic Container Service (ECS) in October 2021 to remove the undifferentiated heavy lifting of managing the underlying host operating system (OS). This has enabled customers to run Windows containers without having to patch, scale, and harden the Windows OS, using the serverless, pay-as-you-go compute engine of AWS Fargate for running their Microsoft Windows applications.

We often get asked by our customers: “What can I do to improve the launch times for my Windows tasks?” Windows task launch times on Fargate are largely comprised of the time for Fargate infrastructure to be ready and the time needed to pull the container images. The Windows Server OS and Windows Server container images are much larger and have more complex dependencies than those for Linux. Therefore, the time taken to launch Windows containers is typically longer. In the last couple of years, we have continued to offer features and improvements to reduce launch times for Windows containers on Fargate. With engineering improvements for Windows tasks on Fargate launched last week, we have reduced the infrastructure ready time by up to 42% for Windows Server 2022 Core.

In this post, we give you a rundown of the improvements we have made on the Fargate service, as well as share some suggestions that you can follow to optimize your Windows task launch times on Fargate. Before we jump into that, let’s first break down the Windows task launch workflow on Fargate.

Background

On Fargate, each Amazon ECS task is deployed onto a single use, single tenant compute instance, and the container images are downloaded from the container image registry for every task. Fargate Windows launch times can be divided largely into the following segments:

  • Infrastructure ready time
  • Container image pull time
  • Task startup time to make sure the containers are in the required state, running or finished execution

Infrastructure ready time refers to the time required for Fargate to provision and bootstrap the underlying compute, and to complete the prerequisites needed to start pulling the container image. On a high level, this includes provisioning an Amazon Elastic Cloud Compute (EC2) instance for the task based on the Fargate specific Windows Server Amazon Machine Image (AMI), bootstrapping the instance to set up the networking stack, setting up the Fargate Agent, and starting the required Windows services. This also includes the creation of task network namespace and completing prerequisite steps, such as pulling AWS Identity and Access Management (IAM) resources and/or registry credentials to enable Fargate agent to start the initial container image pull in the next phase. Additionally, various other Fargate worker processes start at this point to perform actions such as the creation and validation of the resources specified in the container log configuration, and pulling the environment variables from AWS Secrets Manager or Parameter Store.

Container image pull time refers to the part of the workflow where containerd, the container runtime, pulls and extracts the container image(s). Container images are typically built using a layered file system, with each layer representing a specific component or change. These layers are stacked on top of each other, allowing for efficient storage and sharing of common components between images. The base layers would typically include the OS specific components, such as essential shared libraries and dependencies. During image pull, these layers are downloaded in parallel, extracted, and then re-layered to recreate the container image on the host.

Task startup time refers to the last part of the startup workflow where the Fargate agent collaborates with containerd to create and start the task containers with the required configurations. This also includes initiating the logger processes so as to capture the container logs and route them to the specified log destination. Users can additionally specify various dependency configurations among the containers known as Container Dependency. The Fargate Agent orchestrates the task containers based on these configurations. Therefore, the time taken during this part of the workflow can be variable for different task configurations.

Service enhancements for improving launch times on Fargate Windows

This section walks through the enhancements we’ve made in Fargate to improve launch times for Windows Server tasks:

  • Optimized Windows Server AMIs: AMIs are pre-configured virtual machine (VM) images that provide the OS, software, and customized settings for your computing environment in AWS cloud. Fargate instances are launched with lean, efficient, and secure Windows Server AMIs, built and optimized for the Fargate environment. We refactored and pre-baked a number of configurations and required Windows services into these AMIs, eliminating the need to perform extensive set up steps during the instance bootstrap and minimizing potential latency or delays in the deployment process.
  • Fargate Windows launches are EC2 fast launch powered: As you may be aware, the standard Windows OS launch process can involve multiple reboots and lengthy initialization steps, which can significantly impact the time required to provision a new instance. The EC2 fast launch capability uses pre-provisioned snapshots as a way to complete some of those steps, such as Windows Sysprep in advance so that the instance launches are faster. We enable EC2 fast launch for the Fargate Windows task launches. By pre-executing the resource-intensive Windows setup tasks, the instances can be launched more efficiently, reducing the overall time required for the Fargate environment to become operational. We make sure that sufficient Fast Launch snapshots are available at all times for catering to the demand. To learn more about how EC2 fast launch can help you speed up your other Windows launches, read this AWS documentation.
  • Recent changes to reduce latencies: Fargate Windows previously relied on a networking proxy that ran as a side-car container within the task network namespace and tunneled the network requests through the task Elastic Network Interface (ENI). The networking proxy container was built using Windows Server Core images, and it had a few latency issues. We have now switched to an alternate mechanism that runs Fargate worker processes from within the task network namespace, eliminating the need for a network proxy altogether. Fargate also now initiates instance bootstrap immediately after the Windows Server OS boot is completed, instead of waiting for EC2 Launch Agent to initiate the instance bootstrap through Amazon EC2 launch userdata. These recent enhancements have reduced the Fargate Windows infrastructure ready time by up to 42% for Windows Server 2022 Core.

Suggested mechanisms for improving launch times on Fargate Windows

Here are a few tips on what you can do to further optimize the launch performance for your Windows Server tasks running on Fargate:

  • Keep your application image up-to-date: Microsoft releases new base container images on the second Tuesday of each month, which include the latest security patches along with various Windows OS patches. Usually the base container images for Windows are large, usually many Gigabytes. The pull and extraction of the base layers would contribute to significant container image pull time. Fargate Windows caches the latest and previous month’s Windows Server core base images provided by Microsoft so that the layers of the container images defined with the task are not pulled from the registry every time a task starts. To use Fargate Windows cache and cut down on the base image layer pull and extraction time, make sure that your containers are built on top of a recent Windows Server base image. Additionally, this would also make sure that your container images are free from known security vulnerabilities and OS bugs. For more information about updating your container images through EC2 Image Builder, see the AWS documentation.
  • Using Windows Server Core images: We recommend using Windows Server 2022 Core for the fastest launch times on Windows. The Windows Server Core image has a significantly smaller footprint as compared to the Windows Server Full image, and it includes the essential components to run most applications (Microsoft documentation). This smaller size leads to faster image pulls, reduced storage requirements, and faster container startup times. Therefore, unless your application needs a GUI or has a heavy reliance on Windows desktop features, we recommend using the Server Core image.

Conclusion

In this post, we’ve shown the significant improvements we’ve made to Fargate launch times for Windows tasks. Note that the launch times for Windows tasks on Fargate should not be directly compared to those of Linux containers due to the larger footprint of Windows images. The key is to optimize the Windows container launch process, rather than expecting parity with Linux.

By using the fast launch capability of Amazon EC2, caching the latest Windows patches, and optimizing bootstrap steps, we’ve been able to dramatically reduce the overall time it takes to get a Windows container up and running. By using these improvements, and by implementing the additional optimization strategies suggested previously, you can make sure that your Windows containers launch as quickly and efficiently as possible, enabling you to deliver your applications to your users faster.