Update: Spring Boot has been updated to version 3, which also means that Amazon Corretto 17 is used as JDK for all versions.
Fast startup times are key to quickly react to disruptions and demand peaks, and they can increase the resource efficiency. With AWS Fargate, you don’t need to take care of the underlying container hosts; however, some changes are often needed to shorten the time to bootstrap your container and the application.
This post describes optimization techniques to be applied to Java applications that run on Fargate. We specifically look at Java Spring Boot applications in this post, but these optimizations can also be applied to other types of containerized Java applications.
You can find the demonstration application code on GitHub that shows you the different implementations.
Solution overview
Our example application is a simple REST-based Create Read Update Delete (CRUD) service that implements basic customer management functionalities. All data is persisted in an Amazon DynamoDB table accessed using the AWS SDK for Java V2.
The REST-functionality is located in the class CustomerController
, which uses the Spring Boot RestController
-annotation. This class invokes the CustomerService
, which uses the Spring data repository implementation, CustomerRepository
. This repository implements the functionalities to access an Amazon DynamoDB table with the AWS SDK for Java V2. All user-related information is stored in a Plain Old Java Object (POJO) called Customer.
The following architecture diagram presents an overview of the solution.
Figure 1. Architecture diagram of the solution
For our tests, we created seven different versions of our application:
- Version 1, not optimized, running on x86_64
- Version 2, not optimized, running on ARM64
- Version 3, custom Java runtime environment (JRE) and additional optimizations running on x86_64
- Version 4, custom JRE and additional optimizations running on ARM64
- Version 5, Spring Native (GraalVM AoT compilation) running on X86_64 with Ubuntu 22 parent image
- Version 6, Spring Native (GraalVM AoT compilation) running on ARM64 with Ubuntu 22 parent image
- Version 7, Spring Native (GraalVM AoT compilation) running on X86_64 with distroless parent image
Prerequisites
You will need the following to complete the steps in this post:
Walkthrough
Multi-arch container images
Multi-arch (or multi-architecture) refers to container images for different processor architectures built from the same code. There are multiple ways to create multi-arch images. In this post, we use a QEMU-emulation to quickly create the multi-arch images. If you plan to use multi-arch images for more than just for testing purposes, then please consider to build your images using a proper CI/CD pipeline. The first step is to install the Docker Buildx
-CLI plugin. This step isn’t necessary if you’ve installed Docker Desktop, which includes buildx
and emulators out of the box.
export DOCKER_BUILDKIT=1
docker build --platform=local -o . https://github.com/docker/buildx.git
mkdir -p ~/.docker/cli-plugins
mv buildx ~/.docker/cli-plugins/docker-buildx
chmod a+x ~/.docker/cli-plugins/docker-buildx
We install the emulators to build and run containers for ARM64
on an Amazon EC2 or AWS Cloud9 instance:
docker run --privileged --rm tonistiigi/binfmt --install all
In the next step, we start with a new builder:
docker buildx create --name SpringBootBuild --use
docker buildx inspect --bootstrap
Now we start building a multi-arch image with the buildx-parameter. In the following command, you can see that we specify two different architectures: amd64
and arm64
. A multi-arch manifest is generated and in addition pushed into an Amazon ECR registry. Note that in the measurement, each version is strictly separated and executed at the same time, so each directory is used to build and push images of the corresponding architecture.
docker buildx build --platform linux/amd64,linux/arm64 --tag <account-id>.dkr.ecr.<region>.amazonaws.com/<your-repo>:latest --push .
In the task definition for Amazon Elastic Container Service (ECS) with Fargate, we specify the parameter cpuArchitecture (valid values are X86_64
and ARM64
) to run your task using the desired CPU architecture. We will go into this in more detail in a later section.
Setting up the infrastructure
In the previous steps, we compiled the application to a native image and built a container image that has been stored in an Amazon ECR repository. Now, we set up the basic infrastructure consisting of an Amazon Virtual Private Cloud (Amazon VPC), an Amazon ECS cluster with Fargate launch type, a DynamoDB table, and an Application Load Balancer (ALB).
Codifying your infrastructure allows you to treat your infrastructure just as code. In this post, we use the AWS CDK, an open-source software development framework, to model and provision cloud application resources using familiar programming languages. The code for the AWS CDK application can be found in the demonstration application’s code repository under cdkapp/lib/cdkapp-stack.ts
.
The following sections set up the infrastructure in the AWS Region eu-west-1 for the first version of our application:
$ npm install -g aws-cdk # Install the CDK if this hasn’t been installed already
$ cd cdkapp
$ npm install # retrieves dependencies for the CDK stack
$ npm run build # compiles the TypeScript files to JavaScript
$ cdk bootstrap
$ cdk deploy CdkappStack --parameters containerImage=<your_repo/you_image:tag> --context cpuType=X86_64
As shown in the last AWS CDK command, it is possible to define the CPU architecture for the Amazon ECS task definition, with the possible values of X86_64
and ARM64
.
The output of the AWS CloudFormation stack is the ALB’s Domain Name System (DNS) record. The heart of our infrastructure is an Amazon ECS cluster with AWS Fargate launch type. With this AWS CDK application, we set up an Amazon ECS cluster with Fargate as the launch type. Depending on the context (x86_64
or ARM64
), a task definition for the Amazon ECS task is created with the right CPU architecture, 1 vCPU, and 2GB RAM. In addition, we create an AWS Fargate service, which is exposed with an ALB. This service also offers a health check that is implemented using Spring Boot Actuator.
Performance considerations
Let’s investigate the impact of using the different optimizations in comparison to the regular build of our sample Java application.
Our references for the performance measurement are the first and second version of the application. Both applications have the same logic implemented with the same dependencies, but only the CPU architecture is different between these applications. The dependencies include the full AWS SDK for Java, the DynamoDB enhanced client, and Lombok. Project Lombok is a code generation library tool for Java to minimize boilerplate code. The DynamoDB enhanced client is a high-level library that is part of the AWS SDK for Java version 2 and offers a straightforward way to map client-side classes to DynamoDB tables. This solution allows you to define the intuitively performed various create, read, update, or delete (CRUD) operations on tables or items in DynamoDB. More information about the Amazon DynamoDB enhanced client and examples can be found here.
In addition, we use Tomcat as the web container and Java 11. In our Dockerfile
, we use Ubuntu 22.04 as the parent image and install a full Amazon Corretto 11 Java Development Kit (JDK). The above conditions result in a container image of considerable size (in our case this would be 900 MB), which has a negative effect on the pull time of the image from the registry as well as on the startup time of the application.
In the second iteration of the application (version three and four), we apply several optimizations to the application. We reduce the number of dependencies by just using the required AWS SDK dependencies. In addition, we replaced Tomcat with Undertow, which is a more lightweight and performant web container. For access to Amazon DynamoDB, we remove the DynamoDB enhanced client and just use the standard client.
For this version, we use Amazon Corretto 17 and build our own runtime using jdeps
and jlink
as part of our multi-stage build process of the container image:
RUN jdeps --ignore-missing-deps \
--multi-release 17 --print-module-deps \
--class-path target/BOOT-INF/lib/* \
target/CustomerService-0.0.1.jar > jre-deps.info
RUN export JAVA_TOOL_OPTIONS="-Djdk.lang.Process.launchMechanism=vfork" && \
jlink --verbose --compress 2 --strip-java-debug-attributes \
--no-header-files --no-man-pages --output custom-jre \
--add-modules $(cat jre-deps.info)
With jdeps
we generate a list of necessary JDK modules to run the application and write this list to jre-deps.info
. This file can be used as input for jlink
, which is a tool to create a custom JRE based on a list of modules. In our Dockerfile
, we use Ubuntu 22.04 as the parent image and copy our custom JRE to the target container image. By limiting the number of dependencies and building a custom JRE, we reduce the size of our target image significantly (about 200 MB). We start our application with the parameters -XX:TieredStopAtLevel=1
and -noverify
. Tiered compilation stopping at level 1 reduces the time that the JVM spends optimizing and profiling your code, which improves the startup time. However, it has a negative impact if the application is called many times because the code isn’t optimized. The noverify-flag
disables bytecode verification that has a security implication: the classloader won’t check the behavior of the bytecode.
In the third iteration (150 MB–200 MB) of our application (versions five, six, and seven), we introduced GraalVM with Spring Native . With this change, you can compile Spring applications to native executables using GraalVM native image compiler. GraalVM is a high-performance distribution of the JDK and transforms bytecode into machine code. This is done using static analysis of the code, which means that all the information is available during compile time. Of course, this implies that you can’t generate code at runtime. For x86
and ARM
we chose Ubuntu 22.04 as parent image, because we want comparable results. To minimize the resulting container image, we create one additional configuration with quay.io/quarkus/quarkus-distroless-image as the parent image for x86
(quarkus-distroless-image isn’t available for ARM64
at the moment).
Measurement and results
We want to get the AWS services and architecture optimized, so we measure the task readiness duration shown below for the AWS Fargate task. This can be calculated using the timestamp of the runTask-API call in AWS CloudTrail and the timestamp of the ApplicationReadyEvent in our Spring Boot-application.
To measure the startup time, we use a combination of data from the task metadata endpoint and API calls to the control plane of Amazon ECS. Among other things, this endpoint returns the task ARN and the cluster name.
We need this data to send describeTasks-calls to the Amazon ECS control plane in order to receive the following metrics:
- PullStartedAt: The timestamp for when the first container image pull began.
- PullStoppedAt: The timestamp for when the last container image pull finished.
- CreatedAt: The timestamp for when the container was created. This parameter is omitted if the container has not been created yet.
- StartedAt: The timestamp for when the container started. This parameter is omitted if the container has not started yet.
The logic for pulling the necessary metrics is implemented in the EcsMetaDataService
class.
The different states of our Fargate tasks are shown in the following diagram.
Figure 2: Different states of our Fargate tasks
And what effect did the change have on our application? The following list is ordered by effectiveness and easiness:
- Reduce the image size: The container image size has the biggest impact for the task readiness time. The smaller the image is, the faster it gets pulled from the Amazon ECR repository and the faster the application starts. Our image for the unoptimized version of our Spring Boot application is over 900 MB large (iteration 1), the optimized version with a custom
JRE
and minimized dependencies has 200 MB (iteration 2), the third iteration with Spring Native is 200 MB with Ubuntu and 150 MB with the distroless image (iterations 3). The effect on the pull time is surprisingly high: from iteration 1 to iteration 2 about 75 percent less time was used, from iteration 2 to iteration 3 the impact is smaller with 38 percent (distroless-based image), respectively 12 percent (Ubuntu-based image), and from iteration 1 and iteration 3 is 85 percent (distroless-based image), respectively 80 percent (Ubuntu-based image).
- Use a custom JRE: For the raw start time of the Java application (
ApplicationReadyEvent
and JVM startup time), we see a huge impact on performance for the different versions: from iteration 1 to iteration 2, with about 78 percent less time was used.
- Use Spring Native: Using GraalVM and native image has a tremendous impact on startup time. From iteration 2 to iteration 3, the startup time improved by 96 percent, which means we achieve a 99 percent improvement from iteration 1 to iteration 3.
When we take a closer look at the complete starting time, beginning with the runTask
-API call and ending with the ApplicationReadyEvent
, we can see a performance gain from iteration 1 to iteration 2 of 58 percent, and from iteration 2 to iteration 3, the impact is 28 percent. We observed an overall improvement of almost 70 percent from iteration 1 to iteration 3.
The startup duration results of our Spring Boot application are shown in the following diagram.
Figure 3. Startup duration results of our Spring Boot application
Tradeoffs
Some legacy libraries and dependencies don’t support the Java 9-module system, which means it isn’t possible to build a custom JRE with jdeps
and jlink
. In such situations, it’s necessary to migrate to a library that supports the module system, which requires additional development effort.
GraalVM assumes that all code is known at the build time of the image, which means that no new code will be loaded at the runtime. Consequently, not all applications can be optimized using GraalVM. For more information, read about the limitations in the GraalVM documentation. If the native image build for your application fails, then a fallback image is created that requires a full JVM to run. In addition, native-image compilation with GraalVM requires more time and impacts developer productivity.
Cleaning up
After you are finished, you can easily destroy these resources with a single command to save costs.
Conclusion
In this post, we demonstrated the impact of different optimization steps on the startup time of a Spring Boot application running on Amazon ECS with Fargate. We started with a typical implementation with several unnecessary dependencies, a full JDK, and a huge parent image. We reduced dependencies and built our own JRE using jdeps
and jlink
. We adopted Spring Native and GraalVM to reduce the startup time and switched to a distroless parent image. For many developers, the variant with the custom JRE and the minimized dependencies is the best solution in terms of complexity of the changes and performance gains. This is especially true when ARM instruction set with AWS Graviton2 are used. Graviton2 processors are custom built by AWS using 64-bit Arm Neoverse cores and they are powered by Fargate. Fargate, powered by Graviton2 processors, delivers up to 40 percent better price performance at 20 percent lower cost over comparable Intel x86-based Fargate for containerized applications.
We hope we’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption. Feel free to submit enhancements to the sample application in the source repository.