Documenting the use of Amazon EC2 Auto Scaling groups in DoD

AWS branded background design with text overlay that says "Documenting the use of Amazon EC2 Auto Scaling groups in DoD"

Many Amazon Web Service (AWS) customers in regulated environments such as the U.S. Department of Defense (DoD) struggle to gain security approval to take advantage of the scaling of Amazon Elastic Compute Cloud (Amazon EC2) using its Auto Scaling capabilities. This is often attributed to configuration management, total asset inventory, compliance with agency third-party security tools, and agency authorization documentation.

The use of Auto Scaling in AWS enables customers to maximize their return on investment (ROI) after migrating to AWS. Auto Scaling of Amazon EC2 instances allows customers to both minimize their costs during times of less system usage as well as dynamically or systematically surge to meet peak demand.

Amazon EC2 Auto Scaling results in instances being added and removed from the customer virtual private cloud (VPC) network. This concept in itself is not new to the DoD. One can think about end user compute devices coming and going with a user, virtual desktop services, or even Citrix-based application presentation. All of these technologies have used a form of scaling in the past. The key to ensuring security and compliance is ensuring proper documentation and securing the baseline template utilized.

This post provides AWS recommended best practices for implementing EC2 Auto Scaling in DoD environments.

First, let’s review some foundational Auto Scaling terminology.

Amazon Machine Image (AMI) – This is an image preconfigured with the selected operating system that provides information required to launch an instance. AMIs can be supported and maintained directly from Amazon, or you can create and maintain your own AMI (often installing custom software).

Auto Scaling launch template – This specifies the type of Amazon EC2 instance that the Amazon EC2 Auto Scaling group creates for you. The Auto Scaling launch template includes the AMI, instance type, key pair, and security groups that any launched instance will be configured with.

Auto Scaling group – This is the collection of Amazon EC2 instances that are treated as a logical group for the purposes of automatic scaling and management. Within the Auto Scaling group, you set parameters such as what launch template to use, scaling policies, health checks, and Amazon EC2 purchase type (for example, On-Demand or Spot).

Documenting Auto Scaling in authorization

DoD customers are required to document their system configuration and demonstrate compliance with security requirements in a system authorization that is reviewed by an authorizing official (AO). AWS recommends customers develop a document as a part of their overall Authority to Operate (ATO) package that addresses the following topics regarding utilizing Auto Scaling groups.

Securing the base template
Network scans for Auto Scaling
Host-based security for Auto Scaling

This post presents some best practices to document the use of Auto Scaling to the AO and how it is done in a compliant deployment.

Setting the boundaries and where in the overall system the use of Amazon EC2 Auto Scaling is being utilized is best accomplished by showing the use within the authorization boundary diagram. A best practice is to ensure the diagram contains, at a minimum, the following:

Subnets that contain the scaling resources are clearly depicted
Availability Zones the Auto Scaling can launch into
Security groups that are assigned to the Auto Scaling group
Network access control lists (network ACLs) assigned to the associated subnets
Load balancer (typically Application Load Balancer) that is fronting the Auto Scaling group
Ports permitted through both the security groups and network ACLs
A note within the diagram that describes the use of Auto Scaling and a variable number of instances

The following is an example diagram depicting the use of Auto Scaling.

Figure 1. Example architecture diagram for using Auto Scaling.

Note 1: In the preceding architectural diagram, Auto Scaling Group 1 is used to scale up and down compute resources within Subnet A and Subnet B. Two Instances are shown for diagram purposes but this will change based on resource demand. All instances are automatically launched from secure template and are identical. Only instances launched via Auto Scaling Group configuration will be deployed into the Subnet A and Subnet B networks and dedicated network scanning aligned.

In addition to authorization boundary diagrams, Mission Owners are often required to list out each asset in the system in a static hardware/software list; mapping the machine hostname to the network IP. When using Auto Scaling, a recommended approach is to list the Auto Scaling group name as the asset and the subnet range for the IP Address.

Securing the base template

Part of ensuring the overall security of the Auto Scaling group includes ensuring that the template utilized to launch new instances is hardened and under configuration management.

Auto Scaling launch templates are configured to launch instances with a selected AMI. The configuration of the Auto Scaling launch template ensures that each EC2 instance that is created within the group utilizes the same base image, with the same software and security profile.

Configuring the launch template and selecting the AMI to use should be restricted to only the individuals or team that is tasked with ensuring operating system and application configuration and security. AWS Identity and Access Management (IAM) is utilized to control access to the launch template, in which granular least privileged access is configurable.

The concept here is to ensure change control on the baseline image utilized with the Auto Scaling group by controlling who can update the AMI.

Additionally, within a singular account, you have the option to grant permission to not only just Auto Scaling API calls but also only to particular Auto Scaling groups by utilizing tagging. The following permissions policy is an example of granting a user permission to configure Auto Scaling in the AWS account but only for Auto Scaling groups tagged with a tag of purpose with value testing. If you do not wish to limit based on tags, you can simply omit the condition statements.

{
   "Version": "2012-10-17",
   "Statement": [{
      "Effect": "Allow",
      "Action": [
          "autoscaling:CreateAutoScalingGroup",
          "autoscaling:UpdateAutoScalingGroup",
          "autoscaling:DeleteAutoScalingGroup"
      ],
      "Resource": "*",
      "Condition": {
          "StringEquals": { "autoscaling:ResourceTag/purpose": "testing" }
      }
   },
   {
      "Effect": "Allow",
      "Action": "autoscaling:Describe*",
      "Resource": "*"
   }]
}

AWS best practice involves managing environments via Infrastructure as Code (IaC). To align with that approach, it is recommended to maintain an up to date base image (AMI) that is used for the Auto Scaling group. Striving to keep everything on standardized templates we would not patch or apply security configuration settings to individual Amazon EC2 instances created as part of the Auto Scaling group. Rather the updates and configuration settings should be made on the base image (AMI). AWS Systems Manager provides the capability to patch individual EC2 instances as well instances in an Auto Scaling group. With AWS Systems Manager you can streamline the updates to the AMI used for the Auto Scaling group.

Example Automation runbooks are available to patch or update an Auto Scaling AMI and update the Auto Scaling launch template. See the AWS documentation for detailed instructions on setting up this automated patching capability. Additionally, customers can utilize AWS Systems Manager Parameter Store to utilize a parameter stored in your AWS account that references and AMI ID. Using this approach, you can update your Auto Scaling Groups to use new AMI IDs without needing to create new launch templates or version of launch templates each time an AMI ID changes.

Asset scans with Auto Scaling

Often in DoD environments, security tools are utilized to scan for the presence of assets and ensure those assets are compliant with security requirements. An example is the requirement to ensure 100 percent asset scan coverage through DoD Assured Compliance Assessment Solution (ACAS). These scans are often conducted on a scheduled basis and in AWS the Auto Scaling group can create and remove assets from the preconfigured designated subnets as needed based on demand. With the added high available and scalability the Auto Scaling group is providing in AWS, security administrators have to factor in those differences when meeting compliance requirements. Customers can mitigate individual assets (AWS EC2) not having a scan by ensuring only the Auto Scaling group is permitted to deploy into the designed Auto Scaling subnets and that each of those assets (AWS EC2) is identical.

One way to address asset scans in the documentation of an ATO for the AO, is to configure the customer provided scanning tools with its own scan profile. This scan profile will be designated for the subnets supporting Auto Scaling groups and clearly identifying them as scan profiles for Auto Scaling. Clear documenting of this scan profile within the ATO will help provide a concise picture to the AO. These scan profiles include ACAS asset scans, discoveries, and rogue system detection systems as well.

Vulnerability scans can and should be run against the Amazon EC2 instances in the Auto Scaling group. However, to ensure a clean scan of an always-on instance, customers can utilize the same AMI that is utilized for the Auto Scaling group to launch an Amazon EC2 instance in an isolated network environment in AWS. That stand-alone Amazon EC2 instance outside of the Auto Scaling group can then be scanned by the customer-preferred vulnerability scanning tools (such as ACAS) and evaluated from a security perspective. This optional approach does have additional cost because an Amazon EC2 instance is running for the sole purpose of obtaining constant vulnerability scans.

Host-based security with Auto Scaling

Customers within AWS have the ability to choose their own Host Based Security Systems (HBSS). In this blog we will explain McAfee/Trellix configuration, but other products can be utilized as well. It is important to ensure that created Amazon EC2 instances are not utilizing the same unique identifier for each machine. Additionally, it is helpful to have instances remove themselves from the McAfee ePolicy Orchestrator (ePO) console when they are deleted by the Auto Scaling group. McAfee Agent version 5.6.x and later address this by installing the agent in virtual desktop infrastructure (VDI) mode. This ensures a new McAfee Agent Global Unique Identifier (GUID) is created each time the system is started. The McAfee Agent can be automated to be installed in VDI mode by running framepkg.exe /Install=agent /enableVDImode as a part of the user data configured in the launch template upon launch of new instances.

Conclusion

Utilizing Amazon EC2 Auto Scaling as a part of a system’s overall configuration is key to optimizing costs as well as configuring high availability. The key to gaining approvals through AOs is often clear and concise documentation within the ATO, diagrams, and security tools.

Customers should review the user guide for Amazon EC2 Auto Scaling, consider how Auto Scaling can improve their availability while reducing costs, and begin conversations with their AOs on the recommendations in this post.

AWS Public Sector Blog