AWS DevOps & Developer Productivity Blog
Building a serverless Jenkins environment on AWS Fargate
Jenkins is a popular open-source automation server that enables developers around the world to reliably build, test, and deploy their software. Jenkins uses a controller-agent architecture in which the controller is responsible for serving the web UI, stores the configurations and related data on disk, and delegates the jobs to the worker agents that run these jobs as their primary responsibility.
Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service, and has been a popular choice to deploy a highly available, fault tolerant, and scalable Jenkins environment. Traditionally, the Amazon Elastic Compute Cloud (Amazon EC2) launch type was the preferred choice, with Amazon Elastic Block Store (Amazon EBS) as the storage for the configurations and data. That changed with the introduction of AWS Fargate support for Amazon EFS.
The objective of this post is to walk you through how to set up a completely serverless Jenkins environment on AWS Fargate using Terraform.
Overview of the solution
The following diagram illustrates the solution architecture.
The objective is to create a fully automated deployment of highly available, production-ready Jenkins in a serverless environment on AWS. We use the following components and services:
- The Jenkins controller URL is backed by an Application Load Balancer. We recommend using AWS Certificate Manager (ACM) to provision an SSL certificate to associate with the load balancer. The SSL termination happens at the load balancer.
- We use a VPC with at least two public and two private subnets.
- The Jenkins controller runs as a service in Amazon ECS using Fargate as the launch type. We use Amazon Elastic File System (Amazon EFS) as the persistent backing store for the Jenkins controller task. The Jenkins controller and Amazon EFS are launched in private subnets.
- Jenkins uses the Amazon ECS Fargate plugin to delegate to Amazon ECS to run the builds on Docker-based agents. Each Jenkins build is run on a dedicated Docker container that is wiped out at the end of the build. Jenkins agents discover the Jenkins controller task using AWS Cloud Map for service discovery.
- You can use the aforementioned plugin to create multiple definitions of the Amazon EC2 Container Service Cloud configuration. This allows you to use an ECS cluster with a Fargate Spot capacity provider to run Jenkins jobs that can tolerate interruptions. This provides cost optimizations to run, for example, large test suites and build jobs.
- The Jenkins controller can assume AWS Identity and Access Management (IAM) roles based on the ARN provided with the Fargate task definition. You can to define multiple ECS agent templates within the ECS plugin with different permission scopes for leveraging security best practices.
- AWS Backup is used for assuring that you can restore all important data from the Jenkins controller in case of an accident.
- You can additionally configure user notifications using Amazon Simple Notification Service (Amazon SNS) to alert users of any failures in the build jobs or in the environment.
Prerequisites
For this post, we developed a Terraform module to perform the deployment. The module and deployment example scripts are available in the GitHub repo. Refer the README for a list of all module variables. The following are required to deploy this Terraform module:
- Terraform 13+.
- A VPC with at least two public and two private subnets.
- An SSL certificate to associate with the Application Load Balancer. It’s recommended to use an ACM certificate. This is not done by the main Terraform module. However, the example in the example directory uses the public ACM module to create the ACM certificate and pass it to the serverless Jenkins module. You can do it this way or explicitly pass the ARN of a certificate that you previously created or imported into ACM.
- An admin password for Jenkins must be stored in AWS Systems Manager Parameter Store. This parameter must be of type
SecureString
and have the namejenkins-pwd
. - Terraform must be bootstrapped. This means that a state Amazon Simple Storage Service (Amazon S3) bucket and a state locking Amazon DynamoDB table must be initialized.
Deployment
All the required resources and configurations are packaged as a Terraform module, which means it’s not directly deployable. However, an example deployment is in the example
directory. To deploy the example, complete the following steps:
- Clone the GitHub repo.
- Change your working directory to the bootstrap directory.
Included in this directory is sample Terraform code to bootstrap the initial Terraform state management resources. A best practice is to use a state backend such as Amazon S3 and a locking mechanism such as DynamoDB when using Terraform. For more information, see State Storage and Locking. Because this bootstrap code creates Terraform state management resources, special care must be taken to save the resultant Terraform state file. Be aware that this state is only saved to a local file named terraform.tfstate. Make sure to save this state file if you want to maintain the S3 state bucket and DynamoDB lock table using Terraform.
- Replace my-state-bucket and my-lock-table with your preferred names:
terraform init
terraform apply \
-var="state_bucket_name=my-state-bucket" \
-var="state_lock_table_name=my-lock-table"
- To deploy the module, change your working directory to
example
. - Copy vars.sh.example to vars.sh.
- Edit the variables in vars.sh as necessary, giving all details specific to your environment (VPC, subnets, Route 53 zone ID and domain name, state bucket, state locking table, and the Region in which you intend to deploy)
- Run deploy_example.sh.
After you deploy all the resources, you can open a browser of your choice and enter the Route 53 alias record name. This opens the Jenkins web UI. Sign in with the user ecsuser
and the password stored in the parameter store. All the plugins mentioned in the plugins.txt are also installed, and the Fargate container service cloud configurations are applied. Two clouds are created: one for Fargate and the other for the Fargate Spot cluster. The following screenshot shows the Fargate cloud.
You’re presented with two preconfigured jobs.
Before we dive into those jobs, let’s take a look at capacity providers.
Amazon ECS on Fargate capacity providers enable you to use both Fargate and Fargate Spot capacity with your Amazon ECS tasks. For more information about capacity providers, see Amazon ECS capacity providers. With Fargate Spot, you can run interruption tolerant Amazon ECS tasks at a discounted rate compared to the Fargate price. Fargate Spot runs tasks on spare compute capacity. When AWS needs the capacity back, your tasks are interrupted with a 2-minute warning.
As part of the module that we just deployed, two ECS Fargate clusters are provisioned: one with a FARGATE
capacity provider and the other with a FARGATE_SPOT
capacity provider. The Jenkins controller service is configured to run on the cluster with the FARGATE
capacity provider.
The two ECS container service cloud configurations created earlier determine which agent the jobs run on. For example, you can configure all the workloads that can tolerate interruptions (such as build jobs and test suites) to run on the fargate-cloud-spot
, which is backed by an ECS Fargate cluster with the FARGATE_SPOT
capacity provider, and all the other workloads (such as Terraform apply runs) to run on the fargate-cloud
, which is backed by an ECS Fargate cluster with the FARGATE
capacity provider. The Fargate Spot capacity provider helps optimize costs by using the spare compute capacity in the AWS Cloud. We call the jobs scheduled to run on the Spot capacity provider non-critical tasks and those running on the Fargate capacity provider critical tasks due to the nature of those jobs. The Jenkins declarative pipeline for the critical task job looks like the following code:
pipeline {
agent {
ecs {
inheritFrom 'build-example'
}
}
stages {
stage('Test') {
steps {
script {
sh "echo this was executed on non spot instance"
}
sh 'sleep 120'
sh 'echo sleep is done'
}
}
}
}
Test the solution
To test the solution, let’s first run the Simple Job Critical Task by choosing Build Now. This creates a task in the cluster with the FARGATE capacity provider.
When the task is in Running
state, it registers itself as an agent and the Jenkins job runs on this agent.
You can see the console output in Jenkins.
When the job is successful, the task is stopped.
Cleanup
In order to clean up all the resources that were created to build the serverless Jenkins environment, navigate to the example
directory and run the following command
terraform destroy -auto-approve
In order to clean up the dynamodb lock table and the S3 state bucket, navigate to the example/bootstrap
directory and run the following command
terraform destroy -auto-approve
Limitations
Because Amazon EFS is network attached storage, in its default configuration it may not be performant enough for some Jenkins use cases. Proper research and testing is advised before using Amazon EFS for productive workloads. For more information, see Amazon EFS performance.
If necessary, set Amazon EFS to maximum I/O mode. You can do this in the Terraform module by setting the variable efs_performance_mode
to maxIO
. For more information, see What are the differences between General Purpose and Max I/O performance modes in Amazon EFS?
Conclusion
In this post, we went through how to deploy a highly available, scalable, fault tolerant, production-ready Jenkins environment in a completely serverless environment on Fargate. We also showed how to use the Terraform module to automate the deployment of all the necessary resources and associated configurations. The post also introduced how to use FARGATE
and FARGATE_SPOT
capacity providers.
When we deploy each job in its own Docker container, we can control the CPU and memory requirements for each job, the base container that defines the environment, the tooling necessary for the job, and more importantly not load the controller with job runs.