AWS Cloud Operations & Migrations Blog

FINRA Gatekeeper: Amazon EC2 Access Management System Using Amazon EC2 Systems Manager

By Daniel Koo, Senior Director at FINRA, and Stephen Mele, Software Developer at FINRA

Introduction

Moving from a traditional data center to the cloud can impose many questions around compliance and security. FINRA took these concerns very seriously with our cloud migration journey to AWS. As a regulatory organization, overseeing up to 75 billion market transactions every day, it is critical for us to establish proper governance to ensure that compliance is met and that the right level of security is in place. In order for us to achieve this, we looked at building solutions on top of existing AWS services for managing human access. We wanted to make sure we properly control who has access to which resources, allow transparency to look at who is trying to access what resources, and add the necessary approval process when access is requested. The goal was to develop a solution which follows a self-service model, making it very easy for the development and ops community to adopt. The Amazon EC2 Systems Manager (SSM) and Run Command service provided us exactly what we needed to build the right solution.

Solution Approach

Our solution was to provide temporary access to people using their existing credentials and permissions already being leveraged. We also wanted to make sure that we give the temporary access in a timely and responsive manner. In a traditional data center, it takes a significant amount of time to request access to a particular server and finally be granted the access. This is typically done through a paper process. Given that the majority of the servers are transient in the cloud, this long process did not work so we needed to automate the process and have a fast turnaround time. Ultimately, we wanted to discourage people from accessing the servers so that we could promote cloud-centric approaches such as re-deployment and self-healing. However this solution was necessary to achieve our desired control when access was absolutely needed. Following this approach, while keeping the compliance and security goals in mind, we developed an application called Gatekeeper which leverages Run Command as the main technology.

Gatekeeper Application

Gatekeeper is designed as a Web application that is built around a request lifecycle management system, where a user performs a search, selects one or more EC2 resources for a specified environment (such as DEV, QA, Production), and makes a request for temporary access. The users have the ability to specify a set of people desiring the temporary access as well as the number of hours needed. Upon submitting the request, depending on the environment they are requesting for, there are two outcomes:

  1. The temporary access is immediately granted to the user(s).
  2. The access request requires a review and approval before being granted the temporary access. Upon approval, access is immediately granted to the user(s), otherwise the access is denied.

Once the request is live, our lifecycle management system keeps track of the time that has elapsed since a request has been fulfilled. As soon as the time allotted to a specific request expires, the system will automatically revoke the user’s access from the resources specified in their request.

Creating the Users

Gatekeeper needs to be able to create an account for each user on all of the instances provided with the Access Request. To achieve this goal, Gatekeeper has documents staged in EC2 Systems Manager that perform the creation of users on running instances. The document itself is a simple shell script which will set up the user for the temporary access. Gatekeeper calls these documents by leveraging EC2 Systems Manager with the AWS SDK for Java. By taking this route, we do not have to worry about directly connecting to each instance and creating the users; we can simply make an AWS API call and let Systems Manager do all of the heavy lifting for us. Upon successful creation, the system will distribute the private keys to each user that is specified in the Access Request. Currently we have create/delete documents for Amazon Linux, Ubuntu, and Windows, but we could easily add more documents to support more operating systems should the need arise.

Example Create Document

{
  "schemaVersion":"1.2",
  "description":"Script for GateKeeper to create temp user.",
  "parameters":{
    "userName":{
      "type":"String",
      "description":"(Required) The username to create.",
      "allowedPattern":"gk-.*",
      "maxChars":64
    },
    "publicKey":{
      "type":"String",
      "description":"(Required) The public key string for the user.",
      "maxChars":4096
    },
    "executionTimeout":{
      "type":"String",
      "default":"300",
      "description":"(Optional) The time in seconds for a command to be completed before it is considered to have failed. Default is 3600 (1 hour). Maximum is 28800 (8 hours).",
      "allowedPattern":"([1-9][0-9]{0,3})|(1[0-9]{1,4})|(2[0-7][0-9]{1,3})|(28[0-7][0-9]{1,2})|(28800)"
    }
  },
  "runtimeConfig":{
    "aws:runShellScript":{
      "properties":[
        {
          "id":"0.aws:runShellScript",
          "runCommand":[
            "useradd -e `date -d '+2 days' '+%Y-%m-%d'` {{ userName }} -m",
            "mkdir /home/{{ userName }}/.ssh",
            "echo '{{ publicKey }}' >> /home/{{ userName }}/.ssh/authorized_keys",
            "chown -R {{ userName }}:{{ userName }} /home/{{ userName }}",
            "chmod -R go-rwx /home/{{ userName }}/.ssh",
            "echo '{{ userName }}  ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/{{ userName }}",
            "usermod -p '*' {{ userName }}"
          ],
          "workingDirectory":"/root",
          "timeoutSeconds":"{{ executionTimeout }}"
        }
      ]
    }
  }
}

At a high level the script takes in 3 parameters and uses these to execute a shell script. The arguments for this script are:

  1. userName – the username that the script will set up
  2. publicKey – the public key that will be used for this user
  3. executionTimeout – how much time to wait for the script to successfully complete

The Gatekeeper application is responsible for providing the Systems Manager document with the userName and the public SSH key associated with that user. Gatekeeper will generate a new public/private SSH key each time so that no key will be re-used.

The script itself will create the user via the useradd command, which will set up the user to expire after 2 days (if the system is unable to successfully remove the user). The user will be able to log into the instance only via their private key.

Revoking the User(s) Access

The Gatekeeper application keeps track of each Access Request that is active and determines whether it is time to revoke the user’s access. When the access period for a request expires, the system will invoke a different Systems Manager script that will execute a shell script which removes the user from the instance(s) to which they had requested access.

Example Remove Document

{
    "schemaVersion":"1.2",
    "description":"Script for GateKeeper to cleanup expired users.",
    "parameters":{
        "userName":{
            "type":"String",
            "description":"(Required) The username to delete.",
            "allowedPattern":"gk-.*",
            "maxChars":64
        },
        "executionTimeout":{
            "type":"String",
            "default":"300",
            "description":"(Optional) The time in seconds for a command to be completed before it is considered to have failed. Default is 3600 (1 hour). Maximum is 28800 (8 hours).",
            "allowedPattern":"([1-9][0-9]{0,3})|(1[0-9]{1,4})|(2[0-7][0-9]{1,3})|(28[0-7][0-9]{1,2})|(28800)"
        }
    },
    "runtimeConfig":{
        "aws:runShellScript":{
            "properties":[
                {
                    "id":"0.aws:runShellScript",
                    "runCommand":[ "cut -f1 -d':' /etc/passwd | grep {{ userName }} > /dev/null && (userdel -rf {{ userName }} ; echo 'user deleted' ) || echo 'no user to delete'",
                                   "ls /etc/sudoers.d/ | grep {{ userName }} > /dev/null && (rm -f /etc/sudoers.d/{{ userName }} ; echo 'sudo file deleted' ) || echo 'no sudo file to delete'"    ],
                    "workingDirectory":"/root",
                    "timeoutSeconds":"{{ executionTimeout }}"
                }
            ]
        }
    }
}

The arguments for this script are:

  1. userName – the username that the script will delete
  2. executionTimeout – how much time to wait for the script to successfully complete

The script itself uses the userName parameter to delete the user from the instance(s). Should the script for some reason fail to run, it will re-try a set amount of times, and if there was no successful run, then the system will notify the Ops team to investigate and remove the user.

Conclusion

By leveraging Amazon EC2 Systems Manager and other services such as Amazon EC2 and AWS Identity and Access Management (IAM), FINRA was able to build a solution to manage temporary access to our resources running in AWS across multiple environments. Systems Manager is very easy to adopt, and it is extremely reliable and fast. It is a great tool for performing ad-hoc execution of scripts on running instances.

The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.