Containers

Use Ansible to bootstrap external container instances with Amazon ECS Anywhere

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that helps you deploy, manage, and scale containerized applications. Within Amazon ECS there is a concept known as Capacity, which is the infrastructure where your containers run. Amazon ECS provides multiple options: Amazon Elastic Compute Cloud (EC2) instances in the AWS Cloud, AWS Fargate, and External. This post focuses on the third option, External, which is referred to as Amazon ECS Anywhere. In this design you can use servers from outside of AWS to provide the infrastructure that Amazon ECS uses to run your containers.

ECS Anywhere is a feature of Amazon ECS that lets you run and manage container workloads on user-managed infrastructure. This feature can help users with a variety of use cases such as:

  • Supporting modernization and migration of legacy on-premises applications to the cloud
  • Deploying data processing workloads at the edge
  • Running services closer to their end users for low-latency connectivity.

When using ECS Anywhere, users need to register their on-premises compute as external instances to their ECS cluster. When running at scale, users are looking for an automated process to register and de-register external instances on on-premises machine provisioning. In this post, we show you how to use Ansible to register and de-register your on-premises (outside of AWS) to an ECS cluster. Since there is no built-in module for ECS Anywhere, we have to be creative in how we make the solution idempotent. We are primarily using the AWS Command Line Interface (AWS CLI) with the built-in Shell module, adding checks and conditions for when each task executes.

Note that this solution is focused on RHEL or Debian based Linux distributions. If you are attempting to use Windows, then it would need to be adjusted.

Solution overview

The solution is comprised of two Ansible playbooks, and it handles common prerequisites for the most common use cases and patterns, such as a standard docker configuration, no custom pre-installed software, etc.

The register.yml playbook runs Operating System (OS) updates, installs the AWS CLI, jq, and curl packages (used later by the playbook), and uses the provided AWS Identity and Access Management (IAM) credentials to register and join each server to AWS Systems Manager and your ECS cluster. To automatically and securely establish trust between the on-premises server and the ECS control plane, the ECS agent makes use of the Systems Manager Agent that is deployed on the on-premises server. Using a hardware fingerprint, the Systems Manager Agent rotates IAM credentials every 30 minutes. The Systems Manager Agent automatically updates the credentials when the external instance reconnects to AWS if it loses connectivity. For registration, the playbook first queries to make sure that the host hasn’t already been registered. If it has, then it skips the remaining steps. If not, then it continues with registration.

The deregister.yml playbook checks to make sure that the host is registered before continuing. If it is, then it locates the instance ID from the local IP address of the instance, and then it finds the Container Instance Amazon Resource Name (ARN) in Amazon ECS from the instance ID. Then, it drains the instance being removed to make sure the containers have been rescheduled on other instances before removing the instance from both Systems Manager and Amazon ECS. The drain stage loops for up to 10 minutes waiting for the node to be fully drained. If after 10 minutes the node isn’t drained, then the playbook sets the status of the host back to ACTIVE and skips the remaining steps. This is shown in the following diagram.

Figure1, ECS-Anywhere Cluster bootstrapping using Ansible

Figure1, ECS-Anywhere Cluster bootstrapping using Ansible

Prerequisites

The following prerequisites are necessary to continue with this post:

  • An AWS Account
  • An existing ECS cluster
  • An ECS Anywhere IAM Role
  • An Ansible Control node (where you run the playbooks)
  • A physical or virtual machine (VM) running a supported OS to setup as an ECS Anywhere external container instance
  • An Ansible inventory file that includes connection information for the machine being added to the cluster
  • This post assumes you are familiar with Ansible, AWS CLI, and basic shell scripting
  • Please review this Amazon ECS Developer Guide to understand additional design considerations for ECS Anywhere. You should review the available agent parameters before registration, as they cannot be changed after registration is complete.

Getting started

  1. Clone the solution on GitHub to your Ansible Control node
    1. Run the command git clone https://github.com/aws-samples/ecs-anywhere-ansible.git </path/to/folder>
  2. Navigate to the folder where you cloned the repository
    1. Run the command cd </path/to/folder>
  3. Copy the file vars/vars.yml.sample to vars/vars.yml
    1. Run the command cp vars/vars.yml.sample vars/vars.yml
  4. Edit the vars.yml document in your preferred editor, replacing the placeholder values
    1. key: ‘<AWS Access Key>’
    2. secret: ‘<AWS Secret Access Key>’
    3. cluster: ‘<Name of your Amazon ECS Cluster>’
    4. region: ‘<Region your Amazon ECS Cluster resides in>’
    5. role: ‘<Name of your Amazon ECS Anywhere IAM Role>’
  5. Edit the register-ecsa-node.yml document in your preferred editor
    1. Replace default as the host with the Ansible host name configured for your node
  6. Run the command ansible-playbook -i /path/to/ansible/hosts register-ecsa-node.yml -v

Walkthrough

After the OS updates and prerequisites have been applied, we need to create an activation in Systems Manager. Since there are currently no ECS Anywhere or Systems Manager modules for Ansible, we use the shell module to execute our commands. This command uses the AWS CLI Systems Manager module to create an activation. This assumes you have followed the prerequisite step of creating an IAM role that is tied to the on-premises node.

  name: "Retrieve SSM activation data"
  shell: "aws ssm create-activation --iam-role {{ role }} --region {{ region }}"
  register: ecs_activation_output

Next, we need to take the data from our activation response and set variables for later use in the Amazon ECS registration. We take the JSON response that is to be captured in the ecs_activation_output stdout parameter and use the set_fact module to set the variables.

  name: "Setting the ecs_activation_id and ecs_activation_code variables"
  set_fact: 
    ecs_activation_id: "{{ (ecs_activation_output.stdout | from_json).ActivationId }}"
    ecs_activation_code: "{{ (ecs_activation_output.stdout | from_json).ActivationCode }}"
  no_log: yes

Now we have the components we need to register our node. Next, we need to download the latest registration script from AWS.

    name: "Download Latest Registration Script"
    get_url:
     url: https://amazon-ecs-agent.s3.amazonaws.com/ecs-anywhere-install-latest.sh
     dest: /tmp/ecs-anywhere-install.sh
     mode: 0440
    when: instanceId.stdout == ''

And register the node:

  name: "Register the Server"
  shell: "bash /tmp/ecs-anywhere-install.sh --region {{ region }} --cluster {{ cluster }} --activation-id {{ ecs_activation_id }} --activation-code {{ ecs_activation_code }}"

Next, we need to make sure that the instance is registered. This stage of the registration validates that the instance is in Amazon ECS, otherwise it reboots the node.

    name: "Find the Container Instance Arn based on EC2 Instance Id"
    args:
    executable: /bin/bash
    shell: |
    instances=$(aws ecs list-container-instances --cluster "{{ cluster }}" --region "{{ region }}" | jq -c '.containerInstanceArns')
   aws ecs describe-container-instances --cluster "{{ cluster }}" --region "{{ region }}" --container-instances $instances | \
     jq ".containerInstances[] | select(.ec2InstanceId==\"{{ instanceId.stdout }}\") | .containerInstanceArn" | tr -d '"'
    register: ecsInstanceArn
    when: instanceId.stdout != ''
-
    name: "Reboot Server"
    async: 1
    poll: 0
    shell: "sleep 2 && shutdown -r now 'reboot initiated by ansible'"
    when: instanceId.stdout == '' or ecsInstanceArn.stdout == ''
-
    name: "Wait for Host to come back up"
    become: false
    local_action: "wait_for host={{ inventory_hostname }} port=22 delay=20"
    when: instanceId.stdout == '' or ecsInstanceArn.stdout == ''

At this point, you can go into the AWS Management Console and validate that the node is joined! Navigate to Amazon ECS Console and choose your cluster. Choose the Infrastructure tab and validate that you can see the instance under Container Instances at the bottom of the page. If you have deployed the sample service provided in the AWS CloudFormation section, then you can now scale up that service and validate that the containers are scheduled on the server.

Cleaning up

Now that the registration is complete, what happens when we need to remove one of our hosts? To do this with Ansible, we need to identify the host we are removing, drain it (a process in which running containers are moved to another host), deregister it from both Systems Manager and Amazon ECS, and finally clean up the Systems Manager and Amazon ECS agents that were installed.

  1. Edit the deregister-ecsa-node.yml document in your preferred editor
    1. Replace default as the host with the Ansible host name configured for your node
  2. Run the command ansible-playbook -i /path/to/ansible/hosts deregister-ecsa-node.yml -v

The playbook uses the host’s local IPv4 address to find it in Systems Manager. We can get the Instance ID from the Systems Manager response. This Instance ID can be used to find the instance in Amazon ECS. Both pieces of data are required for deregistration. Now that we have identifiers in both Systems Manager and Amazon ECS, we’re ready to deregister.

Conclusion

In this post, we demonstrated the process of using Ansible automation to seamlessly register and de-register on-premises servers to Amazon ECS Anywhere. Ansible is a configuration management tool that can help with managing tasks such as bootstrapping, which is a remote administration of your Amazon ECS Anywhere external container instances. Additionally, we provided insights into making sure of the idempotence of these playbooks and validating their success.