AWS Security Blog

How to centralize findings and automate deletion for unused IAM roles

Maintaining AWS Identity and Access Management (IAM) resources is similar to keeping your garden healthy over time. Having visibility into your IAM resources, especially the resources that are no longer used, is important to keep your AWS environment secure. Proactively detecting and responding to unused IAM roles helps you prevent unauthorized entities from gaining access to your AWS resources. In this post, I will show you how to apply resource tags on IAM roles and deploy serverless technologies on AWS to detect unused IAM roles and to require the owner of the IAM role (identified through tags) to take action.

You can use this solution to check for unused IAM roles in a standalone AWS account. As you grow your workloads in the cloud, you can run this solution for multiple AWS accounts by using AWS Organizations. In this solution, you use AWS Control Tower to create an AWS Organizations organization with a Security organizational unit (OU), and a Security account in this OU. In this blog post, you deploy the solution in the Security account belonging to a Security OU of an organization.

For more information and recommended best practices, see the blog post Managing the multi-account environment using AWS Organizations and AWS Control Tower. Following this best practice, you can create a Security OU, in which you provision one or more Security and Audit accounts that are dedicated for security automation and audit activities on behalf of the entire organization.

Solution architecture

The architecture diagram in Figure 1 demonstrates the solution workflow.

Figure 1: Solution workflow for standalone account or member account of an AWS Organization.

Figure 1: Solution workflow for standalone account or member account of an AWS Organization.

The solution is triggered periodically by an Amazon EventBridge scheduled rule and invokes a series of actions. You specify the frequency (in number of days) when you create the EventBridge rule. There are two options to run this solution, based on the needs of your organization.

Option 1: For a standalone account

Choose this option if you would like to check for unused IAM roles in a single AWS account. This AWS account might or might not belong to an organization or OU. In this blog post, I refer to this account as the standalone account.

Prerequisites

  1. You need an AWS account specifically for security automation. For this blog post, I refer to this account as the standalone Security account.
  2. You should deploy the solution to the standalone Security account, which has appropriate admin permission to audit other accounts and manage security automation.
  3. Because this solution uses AWS CloudFormation StackSets, you need to grant self-managed permissions to create stack sets in standalone accounts. Specifically, you need to establish a trust relationship between the standalone Security account and the standalone account by creating the AWSCloudFormationStackSetAdministrationRole IAM role in the standalone Security account, and the AWSCloudFormationStackSetExecutionRole IAM role in the standalone account.
  4. You need to have AWS Security Hub enabled in your standalone Security account, and you need to deploy the solution in the same AWS Region as your Security Hub dashboard.
  5. You need a tagging enforcement in place for IAM roles. This solution uses an IAM tag key Owner to identify the email address of the owner. The value of this tag key should be the email address associated with the owner of the IAM role. If the Owner tag isn’t available, the notification email is sent to the email address that you provided in the parameter ITSecurityEmail when you provisioned the CloudFormation stack.
  6. This solution uses Amazon Simple Email Service (Amazon SES) to send emails to the owner of the IAM roles. The destination address needs to be verified with Amazon SES. With Amazon SES, you can verify identity at the individual email address or at the domain level.

An EventBridge rule triggers the AWS Lambda function LambdaCheckIAMRole in the standalone Security account. The LambdaCheckIAMRolefunction assumes a role in the standalone account. This role is named after the Cloudformation stack name that you specify when you provision the solution. Then LambdaCheckIAMRole calls the IAM API action GetAccountAuthorizationDetails to get the list of IAM roles in the standalone account, and parses the data type RoleLastUsed to retrieve the date, time, and the Region in which the roles were last used. If the last time value is not available, the IAM role is skipped. Based on the CloudFormation parameter MaxDaysForLastUsed that you provide, LambdaCheckIAMRole determines if the last time used is greater than the MaxDaysForLastUsed value. LambdaCheckIAMRole also extracts tags associated with the IAM roles, and retrieves the email address of the IAM role owner from the value of the tag key Owner. If there is no Owner tag, then LambdaCheckIAMRole sends an email to a default email address provided by you from the CloudFormation parameter ITSecurityEmail.

Option 2: For all member accounts that belong to an organization or an OU

Choose this option if you want to check for unused IAM roles in every member account that belongs to an AWS Organizations organization or OU.

Prerequisites

  1. You need to have an AWS Organizations organization with a dedicated Security account that belongs to a Security OU. For this blog post, I refer to this account as the Security account.
  2. You should deploy the solution to the Security account that has appropriate admin permission to audit other accounts and to manage security automation.
  3. Because this solution uses CloudFormation StackSets to create stack sets in member accounts of the organization or OU that you specify, the Security account in the Security OU needs to be granted CloudFormation delegated admin permission to create AWS resources in this solution.
  4. You need Security Hub enabled in your Security account, and you need to deploy the solution in the same Region as your Security Hub dashboard.
  5. You need tagging enforcement in place for IAM roles. This solution uses the IAM tag key Owner to identify the owner email address. The value of this tag key should be the email address associated with the owner of the IAM role. If the Owner tag isn’t available, the notification email will be sent to the email address that you provided in the parameter ITSecurityEmail when you provisioned the CloudFormation stack.
  6. This solution uses Amazon SES to send emails to the owner of the IAM roles. The destination address needs to be verified with Amazon SES. With Amazon SES, you can verify identity at the individual email address or at the domain level.

An EventBridge rule triggers the Lambda function LambdaGetAccounts in the Security account to collect the account IDs of member accounts that belong to the organization or OU. LambdaGetAccounts sends those account IDs to an SNS topic. Each account ID invokes the Lambda function LambdaCheckIAMRole once.

Similar to the process for Option 1, LambdaCheckIAMRole in the Security account assumes a role in the member account(s) of the organization or OU, and checks the last time that IAM roles in the account were used.

In both options, if an IAM role is not currently used, the function LambdaCheckIAMRole generates a Security Hub finding, and performs BatchImportFindings for all findings to Security Hub in the Security account. At the same time, the Lambda function starts an AWS Step Functions state machine execution. Each execution is for an unused IAM role following this naming convention:
[target-account-id]-[unused IAM role name]-[time the execution created in Unix format]

You should avoid running this solution against special IAM roles, such as a break-glass role or a disaster recovery role. In the CloudFormation parameter RolePatternAllowedlist, you can provide a list of role name patterns to skip the check.

Use a Step Functions state machine to process approval

Figure 2 shows the state machine workflow for owner approval.

Figure 2: Owner approval state machine workflow

Figure 2: Owner approval state machine workflow

After the solution identifies an unused IAM role, it creates a Step Functions state machine execution. Figure 2 demonstrates the workflow of the execution. After the execution starts, the first Lambda task NotifyOwner (powered by the Lambda function NotifyOwnerFunction) sends an email to notify the IAM role owner. This is a callback task that pauses the execution until a taskToken is returned. The maximum pause for a callback task is 1 year. The execution waits until the owner responds with a decision to delete or keep the role, which is captured by a private API endpoint in Amazon API Gateway. You can configure a timeout to avoid waiting for callback task execution.

With a private API endpoint, you can build a REST API that is only accessible within your Amazon Virtual Private Cloud (Amazon VPC), or within your internal network connected to your VPC. Using a private API endpoint will prevent anyone from outside of your internal network from selecting this link and deleting the role. You can implement authentication and authorization with API Gateway to make sure that only the appropriate owner can delete a role.

If the owner denies role deletion, then the role remains intact until the next automation cycle runs, and the state machine execution stops immediately with a Fail status. If the owner approves role deletion, the next Lambda task Approve (powered by the function ApproveFunction) checks again if the role is not currently used. If the role isn’t in use, the Lambda task Approve attaches an IAM policy DenyAllCheckUnusedIAMRoleSolution to deny the role to perform any actions, and waits for 30 days. During this wait time, you can restore the IAM role by removing the IAM policy DenyAllCheckUnusedIAMRoleSolution from the role. The Step Functions state machine execution for this role is still in progress until the wait time expires.

After the wait time expires, the state machine execution invokes the Validate task. The Lambda function ValidateFunction checks again if the role is not in use after the amount of time calculated by adding MaxDaysForLastUsed and the preceding wait time. It also checks if the IAM policy DenyAllCheckUnusedIAMRoleSolution is attached to the role. If both of these conditions are true, the Lambda function follows a process to detach the IAM policies and delete the role permanently. The role can’t be recovered after deletion.

Note: To restore a role that has been marked for deletion, detach the DenyAll IAM policy from the role.

To deploy the solution using the AWS CLI

  1. Clone git repo from AWS Samples to get source code and CloudFormation templates.
    git clone https://github.com/aws-samples/aws-blog-automate-iam-role-deletion 
    cd /aws-blog-automate-iam-role-deletion
  2. Run the AWS CLI command below to upload CloudFormation templates and Lambda code to a S3 bucket in the Security Account. The S3 bucket needs to be in the same Region where you will deploy the solution.
    • To deploy the solution for a single account, use the following commands. Be sure to replace <YOUR_BUCKET_NAME> and <PATH_TO_UPLOAD_CODE> with your own values.
      #Deploy solution for a single target AWS Account
      aws cloudformation package \
      --template-file solution_scope_account.yml \
      --s3-bucket <YOUR_BUCKET_NAME> \
      --s3-prefix <PATH_TO_UPLOAD_CODE> \
      --output-template-file solution_scope_account.template
    • To deploy the solution for an organization or OU, use the following commands. Be sure to replace <YOUR_BUCKET_NAME> and <PATH_TO_UPLOAD_CODE> with your own values.
      #Deploy solution for an Organization/OU
      aws cloudformation package \
      --template-file solution_scope_organization.yml \
      --s3-bucket <YOUR_BUCKET_NAME> \
      --s3-prefix <PATH_TO_UPLOAD_CODE> \
      --output-template-file solution_scope_organization.template
  3. Validate the template generated by the CloudFormation package.
    • To validate the solution for a single account, use the following commands.
      #Deploy solution for a single target AWS Account
      aws cloudformation validate-template —template-body file://solution_scope_account.template
    • To validate the solution for an organization or OU, use the following commands.
      #Deploy solution for an Organization/OU
      aws cloudformation validate-template —template-body file://solution_scope_organization.template
  4. Deploy the solution in the same Region that you use for Security Hub. The stack takes 30 minutes to complete deployment.
    • To deploy the solution for a single account, use the following commands. Be sure to replace all of the placeholders with your own values.
      #Deploy solution for a single target AWS Account
      aws cloudformation deploy \
      --template-file solution_scope_account.template \
      --stack-name <UNIQUE_STACK_NAME> \
      --region <REGION> \
      --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
      --parameter-overrides AccountId='<STANDALONE ACCOUNT ID>' \
      Frequency=<DAYS> MaxDaysForLastUsed=<DAYS> \
      ITSecurityEmail='<YOUR IT TEAM EMAIL>' \
      RolePatternAllowedlist='<ALLOWED PATTERN>'
    • To deploy the solution for an organization, run the following commands to create CloudFormation stack in the Security Account of the organization.
      #Deploy solution for an Organization
      aws cloudformation deploy \
      --template-file solution_scope_organization.template \
      --stack-name <UNIQUE_STACK_NAME> \
      --region <REGION> \
      --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
      --parameter-overrides Scope=Organization \
      OrganizationId='<o-12345abcde>' \
      OrgRootId='<r-1234>'  \
      Frequency=<DAYS> MaxDaysForLastUsed=<DAYS> \
      ITSecurityEmail='<security-team@example.com>' \
      RolePatternAllowedlist='<ALLOWED PATTERN>'
    • To deploy the solution for an OU, run the following commands to create CloudFormation stack in the Security Account of the organization.
      #Deploy solution for an OU
      aws cloudformation deploy \
      --template-file solution_scope_organization.template \
      --stack-name <UNIQUE_STACK_NAME> \
      --region <REGION> \
      --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
      --parameter-overrides Scope=OrganizationalUnit \
      OrganizationId='<o-12345abcde>' \
      OrganizationalUnitId='<ou-1234-1234abcd>'  \
      Frequency=<DAYS> MaxDaysForLastUsed=<DAYS> \
      ITSecurityEmail=’<security-team@example.com>’ \
      RolePatternAllowedlist=’<ALLOWED PATTERN>

Test the solution

The solution is triggered by an EventBridge scheduled rule, so it doesn’t perform the checks immediately. To test the solution right away after the CloudFormation stacks are successfully created, follow these steps.

To manually trigger the automation for a single account

  1. Navigate to the AWS Lambda console and choose the function
    <CloudFormation stackname>-LambdaCheckIAMRole.
  2. Choose Test.
  3. Choose New event.
  4. For Name, enter a name for the event, and provide the current time in UTC Date Time format YYYY-MM-DDTHH:MM:SSZ. For example {“time”: “2022-01-22T04:36:52Z”}. The Lambda function uses this value to calculate how much time has passed since the last time that a role was used. Figure 5 shows an example of configuring a test event.
    Figure 5: Configure test event for standalone account

    Figure 5: Configure test event for standalone account

  5. Choose Test.

To manually trigger the automation for an organization or OU

  1. Choose the function
    [CloudFormation stackname]-LambdaGetAccounts.
  2. Choose Test.
  3. Choose New event.
  4. For Name, enter a name for the event. Leave the default values for the remaining fields.
  5. Choose Test.

Respond to unused IAM roles

After you’ve triggered the Lambda function, the automation runs the necessary checks. For each unused IAM role, it creates a Step Functions state machine execution.

To see the list of Step Functions state machine executions

  1. Navigate to the AWS Step Functions console.
  2. Choose state machine [CloudFormation stackname]OnwerApprovalStateMachine.
  3. Under the Executions tab, you will see the list of executions in running state following this naming convention: [target-account-id]-[unused IAM role name]-[time the execution created in Unix format]. Figure 6 shows an example list of executions.
    Figure 6: Each unused IAM role generates an execution in the Step Functions state machine

    Figure 6: Each unused IAM role generates an execution in the Step Functions state machine

Each execution sends out an email notification to the IAM role owner (if available through the Owner tag) or to the IT security email address that you provided in the CloudFormation stack parameter ITSecurityEmail. The email content is:

Subject: Please take action on this unused IAM Role
 
Hello!
 
This IAM Role arn:aws:iam::<AWS account>:role/<role name> is not in use for
more than 60 days.
 
Can you please delete the role by following this link: Approve link
 
Or keep this role by following this link: Deny Link

In the email, the Approve link and Deny link is the hyperlink to a private API endpoint with a parameter taskToken. If you try to access these links publicly, they won’t work. When you access the link, the taskToken is provided to the private API endpoint, which updates the Step Functions state machine.

To test the approval action using an API Gateway test

  1. Navigate to the AWS Step Functions console. Under State machines, choose the state machine that has the name [CloudFormation stackname]OwnerApprovalStateMachine
  2. On the Executions tab, there is a list of executions. Each execution represents a workflow for one IAM role, as shown in Figure 6. Choose the execution name that includes the IAM role name in the email that you received earlier.
  3. Scroll down to Execution event history.
  4. Expand the Step Notify Owner, enter TaskScheduled, find the item taskToken, and copy its value to a notepad, as shown in Figure 7.
    Figure 7: Retrieve taskToken from execution

    Figure 7: Retrieve taskToken from execution

  5. Navigate to the API Gateway console.
  6. Choose the API that has a name similar to [CloudFormation stackname]-PrivateAPIGW-[unique string]-ApprovalEndpoint.
  7. Choose which action to test: Deny or Approve.
    • To test the Deny action, under /deny resource, choose the GET method.
    • To test the Approve action, under /approve resource, choose the GET method.
  8. Choose Test.
  9. Under Query Strings, enter taskToken= and paste the taskToken you copied earlier from the state machine execution. Figure 8 shows how to pass the taskToken to API Gateway.
    Figure 8: Provide taskToken to API Gateway Method

    Figure 8: Provide taskToken to API Gateway Method

  10. Choose Test. After you test, the state machine resumes the workflow and finishes the automation. You won’t be able to change the action.
  11. Navigate to the AWS Step Functions console. Choose the state machine and go to the state machine execution.
    1. If you choose to deny the role deletion, the execution immediately stops as Fail.
    2. If you choose to approve the role deletion, the execution moves to the Wait task. This task removes IAM policies associated to the role and waits for a period of time before moving to the next task. By default, the wait time is 30 days. To change this number, go to the Lambda function [CloudFormation stackname]ApproveFunction, and update the variable wait_time_stamp.
    3. After the waiting period expires, the state machine triggers the Validate task to do a final validation on the role before deleting it. If the Validate task decides that the role is being used, it leaves the role intact. Otherwise, it deletes the role permanently.

Conclusion

In this blog post, you learned how serverless services such as Lambda, Step Functions, and API Gateway can work together to build security automation. We recommend testing this solution as a starting point. Then, you can build more features on top of the sample code and templates to customize it to perform checks, following guidance from your IT security team.

Here are a few suggestions that you can take to extend this solution.

  • This solution uses a private API Gateway to handle the approval response from the IAM role owner. You need to establish private connectivity between your internal network and AWS to invoke a private API Gateway. For instructions, see How to invoke a private API.
  • Add a mechanism to control access to API Gateway by using endpoint policies for interface VPC endpoints.
  • Archive the Security Hub finding after the IAM role is deleted using the AWS CLI or AWS Console.
  • Use a Step Functions state machine for other automation that needs human approval.
  • Add the capability to report on IAM roles that were skipped due to the absence of RoleLastUsed information.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Hong Pham

Hong Pham

Hong is a Senior Solutions Architect at AWS. For more than five years, she has helped many customers from start-ups to enterprises in different industries to adopt Cloud Computing. She was born in Vietnam and currently lives in Seattle, Washington.