AWS Management Tools Blog

How Cloudticity Automates Security Patches for Linux and Windows using Amazon EC2 Systems Manager and AWS Step Functions

This guest post was written by Uri Katsir, AWS Architect at Cloudticity, and Thomas Zinn, Project Manager at Cloudticity.

As a provider of HIPAA-compliant solutions using AWS, Cloudticity always has security as the base of everything we do. HIPAA breaches would be an end-of-life event for most of our customers. Having been born in the cloud with automation in our DNA, Cloudticity embeds automation into all levels of infrastructure management including security, monitoring, and continuous compliance. As mandated by the HIPAA Security Rule (45 CFR Part 160 and Subparts A and C of Part 164), patches at the operating system and application level are required to prevent security vulnerabilities. As a result, patches are a major component of infrastructure management.

Cloudticity strives to provide consistent and reliable services to all of our customers. As such, we needed to create a custom patching solution that supports both Linux and Windows. The minimum requirements for such a solution were to read from a manifest file that contains instance names and a list of knowledge base articles (KBs) or security packages to apply to each instance. Below is a simplified, high-level process overview.

There were a few guidelines to be considered when designing the solution:

  1. Each customer has a defined maintenance window that patches can be completed within. As such, the solution must be able to perform the updates within the specified maintenance window.
  2. The solution must be able to provide patches to one or many instances and finish within the maintenance window.
  3. The solution should use as many AWS services as possible to reduce time-to-market and take advantage of the built-in scaling that many AWS services provide.
  4. Code reusability is essential.

In this post, we walk through our solution and discuss how combining Amazon EC2 Systems Manager and AWS Step Functions is ideal for this particular use case.

Systems Manager for Patch Management

We have been using Systems Manager to remotely manage instances from its inception, so it was natural to consider it as part of the solution. We use Systems Manager in many ways:

  1. Installing New Relic and Trend Micro agents for every new instance that is launched (whether manually or via Auto Scaling).
  2. Ensuring that EC2Config and the SSM agents are running the latest versions.
  3. Deploying software updates to multiple remote instances.

We began with the Systems Manager predefined documents “AWS-InstallWindowsUpdates” for Windows and “AWS-RunShellScript” for Linux. To achieve maximum code reusability, we created several AWS Lambda functions to perform individual tasks, such as identifying operating system type (Linux or Windows) or applying patches using the SSM agents. Each process (rectangle) in the flowchart above is implemented using a single Lambda function. The Systems Manager document is smart enough to apply multiple patches provided by a manifest file stored in Amazon S3 and only perform a single reboot, if a reboot is required to complete the patch process. Below is a code snippet from the Lambda function that runs Linux patches:

if (IncludedKB === '') {	
	// Build the yum command based on the entry in the manifest file
       command = 'sudo yum update -y';
	   } else if (IncludedKB.toUpperCase() == 'SECURITY') {
	       command = 'yum update --security -y';
	   } else {
	       command = 'yum --security update ' + IncludedKB + ' -y';
	   }
	

	   var commands = [];
	

	   commands.push(command);
	
//prepare parameters for the Systems Manager command. Specify the output folder to hold the run //results.
	   var params = {
	       DocumentName: 'AWS-RunShellScript',
	       InstanceIds: [instanceId],
	       Comment: 'OS Patch command',
	       OutputS3BucketName: bucket,
	       OutputS3KeyPrefix: OutputFolder,
	       Parameters: {
	           commands: commands
	       },
	       TimeoutSeconds: 60
	   };
	   console.log('Sending command to InstanceID: ' + instanceId);
	   console.log('Command is: ' + command);
	   ssm.sendCommand(params, function(err, data) {
	       //console.log('err= ' + err);
	       if (err) {
	           callback(command + ', Error, ' + err.code);
	       } else {
	           console.log(command + ', ' + data.Command.CommandId + ', ' + data.Command.Status);
	           callback(null);
	       }
	   });

Step Functions for workflow orchestration

Given the nature of the solution (microservices that perform unique and distinct tasks), we needed a centralized service to manage and drive the process end-to-end, as well as pass parameters between Lambda functions. In other words, a state machine of Lambda functions. At AWS re:Invent 2016, AWS Step Functions was released. It allowed us to spin up multiple concurrent processes for completing the patches (one for each row in the manifest file) without the need to worry about scale, deadlocks, timeouts, or race conditions. By running concurrent processes, we can complete the patch process in the minimum time required for the patches to be applied without introducing any overhead to the process.

Step Functions provides some additional benefits including visualization of the overall state machine, passing input/output parameters between Lambda functions, and simplified debugging for each step in the process.

The image below is a visual preview of the patching process generated by the Step Functions user interface in the AWS Management Console. Notice how the resulting state machine mimics the initial workflow.

 

 

 

 

 

 

 

 

 

 

 

{
  "Comment": "Cloudticity-Oxygen-OS-Patch-SF.",
  "StartAt": "GetInstancePlatform",
  "States": {
    "GetInstancePlatform": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:Cloudticity-Oxygen-GetInstancePlatform",
      "Next": "ChoiceState"
    },
    "ChoiceState": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.PlatformType",
          "StringEquals": "Windows",
          "Next": "WindowsMatchState"
        },
        {
          "Variable": "$.PlatformType",
          "StringEquals": "Linux",
          "Next": "LinuxMatchState"
        }
      ],
      "Default": "NoMatchFound"
    },
    "WindowsMatchState": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:Cloudticity-Oxygen-Windows-OS-Patching",
      "Retry": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "IntervalSeconds": 6,
          "MaxAttempts": 2,
          "BackoffRate": 3
        }
      ],
      "End": true
    },
    "LinuxMatchState": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:Cloudticity-Oxygen-Linux-OS-Patching",
      "End": true
    },
    "NoMatchFound": {
      "Type": "Fail",
      "Cause": "No Matches!"
    }
  }
}

The first state (GetInstancePlatform) executes the Cloudticity-Oxygen-GetInstancePlatform Lambda function. As you can see in the code snippet below, the Lambda function then invokes the ssm.describeInstanceInformation method to retrieve the platform type of the instance (Linux or Windows).

var aws = require('aws-sdk');	
	var getInstancePlatform = function(instanceID, event, IncludeKbs, isReboot, bucket, OutputFolder, callback) {
	   var ssm = new aws.SSM();
//Prepare the parameters for the Systems Manager run. Instance ID is an input parameter in the //manifest file.
	   var params = {
	       InstanceInformationFilterList: [{
	           key: "InstanceIds",
	           valueSet: [
	               instanceID
	           ]
	       }]
	   };
	   ssm.describeInstanceInformation(params, function(err, instanceData) {
	       if (err) {
	           callback(err);
	       } else {
	           let instanceInformation = instanceData.InstanceInformationList[0];
	           if (instanceInformation) {
	               console.log('Running GetInstancePlatform. PlatformType = ' + instanceInformation.PlatformType);

	               //return instance information to the next step in the Step Function.
	               callback(null, {
	                   PlatformType: instanceInformation.PlatformType,
	                   kB: IncludeKbs,
	                   instanceID: instanceID,
	                   isReboot: isReboot,
	                   bucket: bucket,
	                   OutputFolder: OutputFolder
	               }, event);
	           } else {
	               callback('Unknown instance platform');
	           }
	
	       }
	   });
	};

The platform information is passed on to the built-in Choice state (InstancePlatform) that adds branching logic to a state machine. The InstancePlatform state uses the platform information to route traffic via one of three options:

  • PatchWindows
  • PatchLinux
  • NoMatchFound.

When patching Windows instances, the state is transferred to the Cloudticity-Oxygen-Windows-OS-Patching Lambda function that invokes the ssm.sendCommand method with “AWS-InstallWindowsUpdates” as the document name. An input to this document is a manifest file, which contains the Windows Knowledge Base (KBs) numbers to be applied to instances.

Below is a snippet from the Lambda function that patches Windows instances.

//prepare parameters for the ssm command. Instance ID and included KBs are input from the manifest file.
var params = {	
	       DocumentName: 'AWS-InstallWindowsUpdates',
	       Comment: 'OS Patch command',
	       InstanceIds: [instanceId],
	       OutputS3BucketName: bucket,
	       OutputS3KeyPrefix: OutputFolder,
	       Parameters: {
	           "Action": ["Install"],
	           "IncludeKbs": [IncludeKbs]
	       },
	       TimeoutSeconds: 60
	   };
	   console.log("Sending InstallWindowsUpdates ssm command with the following InstanceId: " + instanceId + "\n");
//Send the ssm command.
	   ssm.sendCommand(params, function(err, data) {
	       if (err) callback(command + ', Error, ' + err.code + "\n");
	       else {
	           callback(null);
	           console.log("Sending InstallWindowsUpdates " + command + ', ' + data.Command.CommandId + ', ' + data.Command.Status + "\n");
	       }
	   });

When patching a Linux instance, the state is transferred to the Cloudticity-Oxygen-Linux-OS-Patching Lambda function that invokes the ssm.sendCommand method with “AWS-RunShellScript” as the document name. Our Linux patching implementation provides two options:

  1. Security packages that need to be applied are entered into a manifest file. This file is referenced as an input parameter to the “AWS-RunShellScript” document.
  2. In the manifest file, instead of listing packages, “security” is specified in the packages place.

In the latter case, all security-related packages are applied to the specified instance according to the manifest file.

Looking Ahead

Based on our customer feedback and internal reviews, we are planning to add the following features to the above implementation:

  • Add AMI patching as part of the overall process: Replace an AMI in an Auto Scaling Launch Configuration with the patched AMI.
  • Add additional exception handling for process failures and the patching product: If an SSM command successfully sent, but the patch process fails on the instance. A notification or support ticket is created. In some cases, a single patch process can take longer than five minutes to run. Since five minutes is the maximum runtime for Lambda functions, we introduce a polling mechanism to query the patch progress following the guidelines outlined in this post.

Summary

Using Systems Manager predefined documents and AWS Step Functions together allowed this solution to be implemented within a short time frame. It also enabled the release of a scalable patching service to multiple customers. Beyond the automation and scalability, our customers see additional benefits like the ability to decrease the length of maintenance windows.

About the Authors

Uri Katsir is an AWS Architect at Cloudticity, where he spends his days building really cool HIPAA-compliant services for the healthcare industry.

 

Thomas Zinn is a Project Manager at Cloudticity, where he provides HIPAA-compliant infrastructure solutions using AWS.

About Cloudticity

Cloudticity helps healthcare organizations design, build, migrate, and manage HIPAA-compliant systems using AWS. We are an audited AWS Managed Service Provider (MSP), AWS Healthcare Competency Partner and AWS DevOps Competency Partner. In addition, as an AWS Service Catalog Partner and Public Sector Partner, Cloudticity is well positioned to help providers, payers, and healthcare services organizations use AWS securely and effectively. For further reading, check out How Cloudticity Uses Automation to Scale Healthcare Solutions and AWS Partner Story: Cloudticity.


AWS is not responsible for the content or accuracy of this post. The content and opinions in this blog are solely those of the third party author. It is each customer’s responsibility to determine whether they are subject to HIPAA, and if so, how best to comply with HIPAA and its implementing regulations. Before using AWS in connection with protected health information, customers must enter an AWS Business Associate Addendum (BAA) and follow its configuration requirements.