A self-service patching solution for multi-account organizations

Patch Management is a critical operation that every organization wants to prioritize. This becomes tedious and challenging if an enterprise operates on a platform-consumer or hub-spoke model. An example of this would be a multi-account environment with hundreds of accounts and thousands of users using applications hosted in AWS.

Different application owners have different requirements in terms of patch operation, timing and frequency of patching, and flexibility to test patches in the lower environments like Dev or UAT. It is difficult for the central platform team to accommodate all of the requirements and simultaneously maintain the right compliance postures across the entire organization.

Solution Overview

This post showcases how application owners can create their own patch maintenance window by leveraging the organization’s Service Catalog portfolio, controlled by a central platform team. The central platform team can enforce different settings through the AWS Service Catalog product. Consumers will have the flexibility to choose between launching their own custom maintenance window, or using the enforced default maintenance window. In the chosen window, AWS Systems Manager will orchestrate the patching activity across multiple regions according to their set parameters, including standalone and autoscaling instances. The solution also provides the central platform team with a dashboard allowing observability of the patch compliance status using Amazon Athena and Amazon QuickSight. In case of an emergency or a zero-day vulnerabilities patching situation, the platform team can leverage the automation to intervene and force patches to the Amazon Elastic Compute Cloud (Amazon EC2) workloads across different member accounts.

Note that this post doesn’t cover patch management of Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Elastic Container Service (Amazon ECS) managed nodes, as it requires planning about other related upgrades as well, which brings a lot of automation complexity.

Solution Architecture

The following is an architecture diagram of the solution described in this post:

Figure 1 Architecture Diagram

The solution architecture can be divided into four different components:

Normal Patching
Emergency patching
Tag monitoring
Compliance reporting

All of these components will be explained in detail later in this blog.

We also refer to a central account in this blog, meaning the account where this solution is hosted, with patching applied to EC2 instances in the member accounts of an organization. If you are familiar with our recommended organization structure, this account could be your Organization Management account or an account in your Infrastructure OU with the purpose of hosting shared services.

Technical Steps

Prerequisites:

An organization set up using AWS Organizations.
AWS Config enabled in the member accounts. Instructions for enabling Config in a multi-account organization can be found here.
Delegated Administrator account for AWS CloudFormation (If you are deploying this solution in any other account than Management account).

Note that if AWS Control Tower is enabled in your organization, AWS Config will be automatically enabled.

Refer to the AWS Control Tower documentation for more information here.

Deployment Steps:

Clone the GitHub repository.
Upload the zip versions of the .py files from the Lambdas folder in the repository, patching_window.yml and crhelper.zip file to an existing or a new Amazon Simple Storage Service (Amazon S3) bucket. Make sure that you update the bucket policy as per the policy json provided in the code repository.
Follow these instructions and deploy the CloudFormation template in the central account that you have designated for patching solution. This can be a delegated administrator for CloudFormation.

1. Navigate to CloudFormation in the AWS Console.
2. Select Create stack and select “with new resources(standard)”.
3. Under the Specify template section, select Upload a template file.
4. Select Choose file and select patching-stack.yml.
5. Provide a stack name (for example: “self-service-patching-stack”).
6. Specify the bucket, containing the source code as per Step-2.
7. Specify the Organization ID. Navigate to the organization from your AWS Console, and you’ll see the Organization ID at the organization dashboard.

Figure 1 Architecture Diagram

Figure 2. Launching the CloudFormation Stack

1. Select Next.
2. Select Next and check “I acknowledge that AWS CloudFormation might create IAM resources with custom names”.
3. Select “Create Stack”.
4. Once the stack is in CREATE_COMPLETE state, select the stack and open the Outputs tab. Note all of the Keys and Values, as you will need it while deploying the StackSets.

Deploy patching-stackset.yml in member accounts as StackSet.

1. Navigate to CloudFormation in the AWS Console and select StackSets.
2. Select Create StackSet and select “Service-managed permissions”.
3. Under the Specify template section, select Upload a template file.
4. Select Choose file and select patching-stackset.yml.
5. Enter a name for the StackSet (for example: self-service-patching-stackset).

Figure 3 Launching the CloudFormation StackSet

Figure 3. Launching the CloudFormation StackSet

1. Provide the requested information.
  Look for the values that you noted in the Deployment Steps.3.k.
  For the Artifact bucket: mention the name of the bucket that you have uploaded zip files in Deployment Steps.2.

Figure 4 Configure StackSet Options

Figure 4. Configure StackSet Options

1. In WorkloadRegions: Specify the Regions separated by comma, in which you have the EC2 workload.
2. Select Next.
3. For Execution configuration, choose Active so that StackSets performs non-conflicting operations concurrently and queues conflicting operations. After conflicting operations finish, StackSets starts queued operations in request order. Select Next to proceed.
4. Select Deployment target as “Deploy to organization”.

The Set deployment options takes inputs such as Add stacks to stackset, deployment targets. The options selected are Deploy new stacks, deploy to organization, automatic deployments enabled and delete stacks as account removal behavior

Figure 5. Set Deployment Options

1. Select all the Regions you want to cover (please make sure to include the Region in which the CloudFormation stack is deployed in Deployment Step.3)
2. Select Next.
3. Scroll down and select the check box “I acknowledge that AWS CloudFormation might create IAM resources with custom names”.
4. Select Submit and navigate to the Stack instances tab in the StackSet.
5. Monitor the status column and wait till Status of all the Stack instances change to CURRENT.
  Note that to know about the list of resources that the CloudFormation Stack is creating, check the readme file in the code repository.

Now that you have completed the steps to deploy the solution, In the next section we will walk you through how to use the solution to create patch maintenance window.

Usage

Prerequisites:

Active EC2 instances in the member accounts with Systems Manager agent installed.
Instance profile having access to Systems Manager and central S3 bucket attached to the EC2 instances. An Instance Profile “InstanceProfileforPatching” is created as part of the automation in all of the member accounts with necessary privileges.
User guide for setting up EC2 as managed node can be found here.
environment=<Dev/Test/Prod> tag attached to the EC2 instances/ AutoScaling Group based on the workload.
Share the service catalog portfolio in the central account across the organization. If you’re using any account other than the AWS Organizations management account, then follow the steps mentioned in the readme file to setup delegated administrator for AWS Service Catalog.

Steps:

Make sure that you’re in the same Region in which you have deployed the CloudFormation stack in Deployment Step.3.
Provide permission to the necessary entities to consume the Service Catalog portfolio in the member accounts.
Launch the Service Catalog product to create the patch maintenance window.
Detailed information regarding the Service Catalog Product is mentioned in the GitHub repository.

Patching Process flow:

Figure 6. Patching Process Flow

Once the maintenance window starts in the designated schedule, it invokes the task Lambda based on the target (Standalone Instance vs AutoScaling Group).
The Lambda function invokes the respective Systems Manager automation document and provides all the necessary parameters.
For Standalone Instance:

1. Systems Manager Automation document navigates to each of the designated Regions.
2. Runs AWS-RunPatchBaseline command to the instances with specific tags.
3. Reboot the instances, if selected by the user as post patching operation.

For Instances part of AutoScaling Group:

1. Systems Manager automation document navigates to each of the designated Regions.
2. Get the Autoscaling group names based on certain tags.
3. Fetch the AMI ID from the launch configuration/launch template.
4. Follow the steps to generate a patched AMI:

1. 1. Create an intermediate instance from the AMI.
  2. Apply AWS-RunPatchBaseline to the instance.
  3. Stop the intermediate instance.
  4. Create AMI.
  5. Terminate the intermediate instance.

1. Updates launch configuration with the patched AMI.
2. Initiates Instance Refresh action based on the user input.
3. Waits for the instance refresh action to complete and scan the instances.

There is an AWS Step function which provides the central team with a platform to intervene in the patching process and deploy ad-hoc patches in case of zero-day vulnerability fix or emergency patching situations.

1. Central platform/operation team member initiates the Emergency patching process by triggering the AWS Step function and providing the necessary parameters, such as environment to target, resources, and an Amazon S3 path style URL specifying the external file in case of any ad-hoc patches using installOverrideList.
2. The state machine triggers a Lambda function which fetches the account details, assumes a role into the member account, and invokes the Lambda functions for patching.

Note that you can integrate a manual approval stage to the Step function as mentioned in the user guide doc here. AWS Step Function will pause for an approval and proceed after the flow is approved.

This solution also deploys a default maintenance window which covers the patching process for the instances which do not have any custom requirements for the patch maintenance window. The default maintenance windows patches both standalone EC2 instances and AutoScaling Group EC2 instances.

Tag Monitoring Flow

Figure 7 Tag Monitoring Process Flow

Figure 7. Tag Monitoring Process Flow

Based on the “environment” tag on the EC2 instance/AutoScaling group, below patching tags will be applied

Patch Group = <Default/Dev/Test/Prod>
maintenance_window = <Default/Dev/Test/Prod>_maintenance_window

An AWS Config rule checks for the tag compliance in each of the member accounts. An Eventbridge rule listens for compliance change and triggers the Lambda function. The Lambda function does the following:

For EC2 instances:
1. It checks for the Autoscaling tag key (aws:autoscaling:groupName), EKS tag key (Alpha.eksctl.io/nodegroup-name or k8s.io/* or kubernetes.io/*), ECS tag key (AmazonECSManaged), and exception tag (patch_install=no).
2. If any of the tags are found, then the Lambda function doesn’t perform any action.
3. If none of the tags are found, then the Lambda function verifies the environment tag, patch maintenance window and applies the patch tags accordingly.
For AutoScaling Group:
1. It checks for the EKS tag (k8s.io/cluster-autoscaler/enabled), ECS tag (AmazonECSManaged), and exception tag (patch_install=no).
2. If any of the tags are found, then the Lambda function doesn’t perform any action.
3. If none of the tags are found, then the Lambda function verifies the environment tag, patch maintenance window, and applies the patch tags accordingly.

Compliance Reporting

Individual member accounts can view the patch compliance status of their Amazon EC2 workload in the Systems manager patch manager dashboard. This post also provides a central patching dashboard with the help of services like Amazon Athena and Amazon QuickSight in the central account.

AWS resource data sync collects detailed inventory of the EC2 instances in the member accounts and sends to the central S3 bucket, which then can be consumed with services like AWS Glue, Athena, and Quicksight to create the compliance dashboard.

Athena Query

This solution deploys a few Athena queries which will help customers extract the patch compliance information across different member accounts and various regions within it.

Steps:

Navigate to the Athena Console in the central account.
Select Saved queries and search for “PatchComplianceReport”.
Select the Database “managed_instances_database”.
Select Run.

Athena Query execution takes few inputs such as Data Source as AWS Data Catalog and Database selected as managed instances database

Figure 8. Athena Query Execution

You can also download the report in csv format.

Quicksight Dashboard

Amazon Quicksight helps you to visualize the data in dashboards.

You can create compliance dashboards in Quicksight to view the patch compliance status across your organization. Detailed instruction has been provided in the github repository.

Sample dashboards:

Patch Compliance status grouped by OS type.

Figure 9 Quicksight Dashboard Sample-1

Figure 9. Quicksight Dashboard Sample-1

Total Managed instance count grouped by Patch Compliance status.

Figure 10 Quicksight Dashboard Sample-2

Figure 10. Quicksight Dashboard Sample-2

Patch Compliance status grouped by AWS account ID.

Figure 11 Quicksight Dashboard Sample-3

Figure 11. Quicksight Dashboard Sample-3

Tear Down Instructions

For removing resources from the member accounts:

In the central account, navigate to the CloudFormation console and select StackSets.
Select the relevant StackSet that you have created in [Deployment Steps.4].
Select Actions and “Delete stacks from StackSet”.
Provide the Organization Id and the relevant regions that you have provided in [Deployment Steps.4.k].
Select Next.
Select Submit.
Check the operations tab in the Stackset and wait for it to be “SUCCEEDED”.
Select Actions and “Delete StackSet”.

For removing resources from the Central account:

In the central account, empty the S3 buckets created as part of the deployment.
You can get the S3 bucket details from the resources section in the CloudFormation Stack.
Navigate to CloudFormation console and select Stacks.
Select the Stack created in [Deployment Steps.3].
Select Delete and select Delete Stack.

Conclusion

In this post, we have described how the responsibility of patching the EC2 workload can be shared between the central platform team and the application/account owners, as well as how the different member accounts can leverage the organization’s service catalog to create the patch maintenance window if they have any custom requirements. The Central platform team can also intervene in case of zero-day vulnerability fixes or emergency patching situation and enforce patches to be applied across all of the EC2 workloads within the organization.

With this, you can start using the solution and maintain a flexible patching strategy across the organization.

Authors:

AWS Cloud Operations Blog

A self-service patching solution for multi-account organizations

Solution Overview

Solution Architecture

Technical Steps

Prerequisites:

Deployment Steps:

Usage

Prerequisites:

Steps:

Patching Process flow:

Tag Monitoring Flow

Compliance Reporting

Athena Query

Steps:

Quicksight Dashboard

Sample dashboards:

Tear Down Instructions

Conclusion

Resources

Follow