How Capgemini used AWS Systems Manager and other AWS services to provide cloud-native, self-service patch management and automation
This post was written in collaboration with David Wansell, an Enterprise Cloud Architect at Capgemini with over 20 years of experience across multiple enterprise domains. He designs and builds automation and solutions that enable customers to deliver on their desired outcomes in their cloud adoption journey.
Customers need a way to do patch management in the AWS Cloud. Many customers leverage managed solutions providers to manage their AWS accounts, and they’re looking for AWS native solutions to solve their business problems.
As a certified AWS Managed Services Provider (MSP), an AWS Premier Consulting Partner with seven AWS Competencies, and AWS Well-Architected Partner Program, Capgemini has been proven to create solutions for challenges that fit the unique and evolving needs of customers.
Cloud Operation Services (COS) is the Managed Service offer for AWS Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions. Built on AWS best practices and tools, this post provides a breakdown of the components leveraged and implemented to provide modern cloud managed services using cloud native tooling. It illustrates the relationship between the components, and it provides detailed information on the use and process flow for each component.
AWS Systems Manager is an operations hub for AWS which provides a universal user interface for users to track and resolve operational issues across their AWS applications and resources from a central location. AWS Systems Manager lets you automate operational tasks for servers running in a hybrid environment via a single interface. A hybrid environment includes on-premises servers and virtual machines (VMs) that have been configured for use with AWS Systems Manager, including VMs in other cloud environments. Furthermore, you can group resources by application, view operational data for monitoring and troubleshooting, implement pre-approved change work flows, and audit operational changes for your resource groups. AWS Systems Manager simplifies resource and application management, shortens the time to detect and resolve operational problems, and eases the operation and management of your infrastructure at scale.
AWS Systems Manager Agent (SSM Agent) is AWS software that can be installed and configured on an Amazon Elastic Compute Cloud (EC2) instance, an on-premises server, or a VM. SSM Agent lets AWS Systems Manager to update, manage, and configure these resources. The agent processes requests from the AWS Systems Manager service in the AWS Cloud, and then runs them as specified in the request. Then, SSM Agent sends status and execution information back to the AWS Systems Manager service by using the Amazon Message Delivery Service (service prefix: ec2messages).
Patch Manager, a capability of AWS Systems Manager, automates the process of patching managed instances with both security-related and other types of updates. Use Patch Manager to apply patches for both operating systems (OS) and applications (on Windows Server, application support is limited to updates for applications released by Microsoft). Furthermore, use Patch Manager to install Service Packs on Windows instances and perform minor version upgrades on Linux instances. You can patch fleets of Amazon EC2 instances or your on-premises servers and VMs by OS type. This includes supported versions of Amazon Linux, Amazon Linux 2, CentOS, Debian Server, macOS, Oracle Linux, Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Ubuntu Server, and Windows Server. Moreover, you can scan instances to see only a report of missing patches, or you can scan and automatically install all of the missing patches.
For patching automation, Capgemini leverages built-in patch management features such as patch baselines and patch groups to control which patches are applied to which instances. This is enriched with Capgemini’s good practices regarding security and compliance. Maintenance windows allow for control over when these changes occur and Systems Manager documents to define the actions that AWS Systems Manager performs on your managed instances. All of these AWS Systems Manager features come together to provide a centralized source of data about a customer’s patching experience that can be used to provide customized views and reporting across their entire landscape.
How Capgemini made it work
AWS Systems Manager Patch Manager is the cloud-native service used to fulfill the patching function. For automated patching, Systems Manager Maintenance Windows are used to trigger Linux and Windows Patching Documents that target baselines created for each OS type. These baselines are created with a Patch Group key. Any EC2 instances that are tagged with the corresponding key will be targeted by the Linux and Windows patching tasks.
Ad hoc/Manual Patching can be accomplished by using a Systems Manager automation doc that has been created and prepopulated with the correct Amazon Simple Notification Service (SNS) topic and AWS Identity and Access Management (IAM) role. The remaining values can be inserted to suit the patching type required.
Monitoring is provided by an instant patch results AWS SNS topic and lambda function. Patching tasks configured to use that SNS topic as a notification endpoint. The patching alert lambda will enrich the patching results and send to the ServiceNow (SNOW) API Lambda function. This function will forward key alarm event payloads into ServiceNow.
Reporting is conducted by the patching report lambda function, which is triggered monthly by an Amazon EventBridge rule.
Once triggered, the patching report lambda function will interact with AWS Systems Manager compliance and collate a report of patching compliance and noncompliance of the EC2 instances over the last 30 days, as well as store in Amazon Simple Storage Service (S3). Lastly, a URL link to the report is sent to the operations team.
Figure 1 shows LLD Diagrams Patching
Systems Manager Patch Manager Patching Assignment
Assignment is handled in a cloud-native manner. A set of tags are used as the mechanism to patch EC2 instances. EC2 instances under support that have the appropriate tag key and value are automatically added to a patching regime. The policy provides guardrails that enable cloud native self-service based on tags and prevents the existence of unmanaged and unpatched instances.
Systems Manager Patch Manager Baseline
When the solution is deployed, a reference security baseline based on Capgemini good practices is created for Windows and for each Linux distribution supported by Systems Manager Patch Manager. These custom patch baselines allow for greater control over which patches are approved or rejected for your environment and can be amended to suit client requirements.
Patch groups help make sure that you’re deploying the appropriate patches, based on the associated patch baseline rules, to the correct set of instances.
The patch groups are targeted by the scheduled rules that are created by the solution. Furthermore, these patch groups can be targeted by ad hoc or scheduled patching operations – provided that the correct patch group is specified as the target.
Systems Manager Patch Manager Maintenance Windows are deployed for Linux and Windows instances. Each maintenance window targets their respective target patch groups.
By default, two maintenance windows are created upon deployment. One is scheduled for Windows and one for Linux instances. The maintenance windows are scheduled to trigger based on cron schedule expression.
Systems Manager Patch Manager Maintenance Windows/Linux Tasks: Patching Tasks
Systems Manager Maintenance Windows task for Windows and Linux are created upon deployment. This Task targets the patch groups specified in the Windows and Linux Maintenance Window Targets, and they are triggered by either the Windows or Linux maintenance window schedule.
By default, reports are generated monthly. These are initiated by an EventBridge rule that is set to trigger once a month. This cron expression schedule is parametrized and can be amended upon deployment or on the rule itself.
Once triggered, the rule will trigger an associated lambda function, which will interact with AWS Systems Manager Compliance, generate the report, and store it in the Patching Report S3 bucket. Lastly, the Patching-Report-Lambda function will send a temporary URL to all of the recipients of the patch reporting SNS topic.
Note from AWS: AWS Systems Manager now has a native method to generate CSV files of patch compliance as mentioned here.
SNS Topic – PatchReportsTopic
This SNS topic is deployed with an email subscription, and it is linked to the Patching-Report-Lambda function. Once a month, this lambda function will email a URL containing the backup report.
Whenever default patching has completed, any failed or timed Out events are sent to the PatchResults SNS topic. This topic will trigger the associated lambda function that will query AWS Systems Manager Compliance, collate a list of noncompliant instances, and send to the COS-SNOW-Monitoring topic. This forwards the payload to the Listener lambda function. Then, this function sends the payload to ServiceNow.
Capgemini now offers a solution for you to manage the patching of your Amazon EC2 instances with end-to-end automation, monitoring for that patching, and alerting if issues are found. To learn more about how Capgemini can assist with your business challenges related to management and governance, and to learn more about Capgemini visit Capgemini Cloud Platform. To learn more about how AWS Systems Manager could be leveraged to manage instances in a hybrid environment, visit AWS Cloud Operations.