AWS Cloud Operations Blog

Implementing automated and centralized tagging controls with AWS Config and AWS Organizations

Introduction

This blog post is for customers who want to implement automated tagging controls and strategy for cost allocation. Customers want to centralize and maintain consistency for tags across AWS Organizations so they are available outside their AWS environment (e.g. in build scripts, etc.) or enforce centralized conditional tagging on existing and new AWS resources and apply controls across AWS Orgs. This blog post shows how you can implement a centralized automated controls and tagging strategy using AWS Organizations, AWS Config, Amazon DynamoDB, Amazon EventBridge, and AWS Systems Manager to ensure tags are in place for new and existing resources across AWS Organizations. Tagging is one of the foundational steps required in order to establish a meaningful cost allocation model.

Cost allocation is the process of identifying, aggregating, and assigning cloud spend to your organization’s teams, business units, products, etc. When done effectively, it can have lasting positive impact on how your business manages cloud costs by assigning ownership to those responsible for the resources across the organization. We recommend following the design principles of a multi-account strategy, with one account per workload & software development life cycle stage and with these accounts grouped into cost categories. For example the cost of compliance may be reported based on AWS managed services like Amazon GuardDuty or Amazon Detective. You can aggregate across all accounts owned by a specific department/team.

When developing a multi-account strategy, it is important to note that AWS Organizations helps with tag policy enforcement. Tag policy enforcement has no effect on resources that are created without tags. You can implement AWS resource tagging strategy using AWS Tag Policies and Service Control Policies (SCPs) however, this approach requires resource tagging considerations for e.g. restoring RDS snapshots will fail where an SCP enforces tagging requirements when creating tags. For the RDS requirement, it is recommended to create a CloudFormation template with the required tags as input parameters. When you require an RDS instance or cluster to be created, you can use the RDS template to create the new instance or cluster where the tags will be enforced and the SCP will deny the removal of the tags from RDS. You can use the solution outlined in this documentation that provides notifications for RDS creation and tag enforcement. For further details, please review the AWS whitepaper on Tagging policies and Implementing and enforcing tagging.

Defining needs and use cases for a tagging schema

Tags can be used for a variety of purposes and a tagging strategy can be split into areas of responsibility within an organization for example tags for resource organization, cost allocation, automation, and access control. A good tagging strategy starts with a cross-functional team (Finance, IT, Engineering/Product, Security, Operations, Business) to define needs and use cases. It is critical to standardize these requirements by defining and publishing a tagging schema made of (as a minimum):

  • A tag key (for example, CostCenter, Environment, or Project). Tag keys are case sensitive.
  • A tag value (for example, 111122223333 or Production). Like tag keys, tag values are case sensitive.

Please refer to tag naming limits and requirements for limitations on how tags can be used and Building and implementing your tagging strategy to find out more about our best practices.

Publishing a tagging schema

It is essential to document and centralize your tagging definitions. You can use Amazon DynamoDB or any database of your choice to store your tagging schema. For our example we will be using DynamoDB. Please review Defining and publishing a tagging schema for best practices.

Use mechanisms for validation

Common pitfalls and challenges of unenforced tagging are 1/tagging requirements are not known by all teams so they don’t adhere to a defined tagging taxonomy and 2/inconsistent tagging caused by the use of different infrastructure provisioning operations. You can learn more about Common pitfalls and challenges in this blog post.

Architecture overview

For the rest of this blog post, we will explain the steps you should take to successfully implement cost allocation tagging across multiple accounts in an organization in AWS Organizations. In part two of this blog post, we will include code examples for each of these steps.

Multi account tagging solution diagram

(Figure 1)

  1. User creates non tagged AWS resource.
  2. AWS Config captures any changes in a customer’s cloud environments across their organization (for details, see Managing AWS Config Rules Across All Accounts in Your Organization) and validates those changes against a set of defined rules. It includes a broad selection of pre-built rules (AWS Config Managed Rules) and also allows customers to create AWS Config Custom Lambda Rules or AWS Config Custom Policy Rules
  3. We will use AWS Config, Custom Lambda Rules for validation and reporting on findings, based on an AWS Lambda function. The AWS Config managed rule required-tags will check up to 6 tags at the time, and does not support all AWS resource types as of now. Hence in some cases, you may have to write a custom rule to include resources you wish to tag. Please review AWS supported resources for AWS Config managed rules.
  4. Admins define their tagging requirements first and store them in a DynamoDB table. An admin in this sense can be anybody who has the permissions to establish the tagging policy of the organization and make entries in the respective table. In parallel, you use AWS EventBridge to schedule collecting and mapping AWS Organizations meta data (account IDs, account contacts, and OU IDs) into another DynamoDB table. When a new account is created, it will be added automatically to the DynamoDB table.

Your tagging entries could look like this:

{
"ResourceType": "AWS::EC2::*",
"Tag": "MyTag1",
"Enabled": true,
"Required": true,
"ValuePattern": "[A-Z][a-z]",
"Values": [],
"AccountIds": []
}

This schema defines requirements per resource type (including wildcards to extend the requirements to a group of resources), a Tag name, if the tag should always be enforced, and you could use a regular expression pattern or a list of values to validate against. Since the setup should work in a multi-account environment, it’s also a good idea to have the ability to limit the scope of certain tags to certain accounts or account groups.

  1. When a user in a member account creates a resource, for example, an ECS Cluster (1) which is not supported by managed rules, AWS Config records resource configuration items like metadata and attributes (2), and invokes a Lambda based on the custom rule (3) to collect custom tagging from the centralized tagging schema in the DynamoDB table (4).
  2. The custom Lambda function compares the DynamoDB tagging schema requirement with the Resources Group tag (5). Lambda will provide a status to the AWS Config rule (6) either the resource is COMPLIANT, NON_COMPLIANT, or NOT_APPLICABLE.

You can customize these validation methods to suit your needs. For example, you could add more details to the feedback provided, like the AWS account id owning the resource. It’s not uncommon for large environments to have hundreds of tagging keys, which makes it crucial to define how these tags are deployed and what rules are used to enforce them. For example, all AWS resources in a production account, need to be tagged with Env = Production, and CostCenter = apps. Use a mechanism as outlined to validate and take action for any untagged resources for e.g. you could shut down the resource, or invoke a CI/CD pipeline to update your infrastructure as code.

The validation engine could be implemented like this:

import os
import platform
import re
from typing import Dict, List, Iterable

import boto3
from boto3.dynamodb.conditions import Key

from exceptions import NonCompliantException, NotApplicableException
from models import Resource, TagRule

TABLE_NAME = os.environ.get("TABLE_NAME")
if 'macos' in (platform := platform.platform().lower()):
    boto3.setup_default_session(profile_name="gaborsch-Admin")
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(TABLE_NAME)

tagging_api = boto3.client("resourcegroupstaggingapi")


def check_tags(configuration_item: Dict):
    """Entrypoint for checking tags"""
    resource_type = configuration_item.get("resourceType")
    aws_account_id = configuration_item.get('awsAccountId')
    tag_requirements = get_tag_requirements(resource_type, aws_account_id)

    if not tag_requirements:
        return

    resource_arn = configuration_item.get("ARN")
    resource_id = configuration_item.get("resourceId")
    resource = get_resource_tags(resource_arn, resource_id, resource_type)
    validate_resource_tags(tag_requirements, resource)


def get_tag_requirements(resource_type: str, aws_account_id: str) -> List:
    """Get tagging requirements"""
    resource_type_split = resource_type.split("::")
    resource_types = []

    for i in range(len(resource_type_split) - 1):
        resource_types.append("::".join(resource_type_split[0 : i + 1]) + "::*")

    resource_types.append(resource_type)
    print(resource_types)

    db_items = []
    for key in resource_types:
        response = table.query(KeyConditionExpression=Key("ResourceType").eq(key))
        rules = map(lambda x: TagRule(**x), response.get("Items", []))
        rules = filter_rules_by_account(rules=rules, aws_account_id=aws_account_id)
        db_items.extend([x for x in rules if x.Enabled])

    return db_items


def filter_rules_by_account(rules: Iterable[TagRule], aws_account_id: str) -> List[TagRule]:
    """Filter rules that are relevant for the given account"""
    result = [x for x in rules if len(x.AccountIds) == 0 or aws_account_id in x.AccountIds]
    return result


def get_resource_tags(resource_arn: str, resource_id: str, resource_type: str) -> Resource:
    """Get the tags of the resource"""
    result = {}
    response = tagging_api.get_resources(ResourceARNList=[resource_arn])
    for resource in response.get("ResourceTagMappingList", []):
        for tag in resource.get("Tags", []):
            result.update({tag.get("Key"): tag.get("Value")})
    return Resource(resource_type, resource_id, resource_arn, result)


def validate_resource_tags(tag_requirements: List[TagRule], resource: Resource):
    """Validation engine for resource tags"""
    for tag_rule in tag_requirements:
        print(f"Validating {tag_rule}...")
        if tag_rule.Required:
            validate_tag_existence(tag_rule, resource)
            if tag_rule.ValuePattern:
                validate_tag_regex(tag_rule, resource)
            else:
                validate_tag_values(tag_rule, resource)
        else:
            validate_tag_values(tag_rule, resource)


def validate_tag_regex(tag_rule: TagRule, resource: Resource):
    """Validate tags based on the ValuePattern regular expression"""
    if not tag_rule.ValuePattern:
        return

    tag_value = resource.Tags.get(tag_rule.Tag)
    value_match = re.fullmatch(tag_rule.ValuePattern, tag_value)
    if not value_match:
        raise NonCompliantException(
            f'{resource.ResourceId} ({resource.ResourceType}) tag value "{tag_value}" violates dictionary regex pattern for "{tag_rule.Tag}": {tag_rule.ValuePattern}'
        )


def validate_tag_values(tag_rule: TagRule, resource: Resource):
    """Validate tags based on the Values list"""
    tag_value = resource.Tags.get(tag_rule.Tag)
    if tag_value and tag_value not in tag_rule.Values:
        raise NonCompliantException(
            f'{resource.ResourceId} ({resource.ResourceType}) tag value "{tag_value}" violates dictionary for "{tag_rule.Tag}": {tag_rule.Values}'
        )


def validate_tag_existence(tag_rule: TagRule, resource: Resource):
    """Validate if the tag exists"""
    if tag_rule.Tag not in resource.Tags:
        raise NonCompliantException(
            f"{resource.ResourceId} ({resource.ResourceType}) missing required tag {tag_rule.Tag}"
        )
  1. If the resource is noncompliant you can trigger automation. Using remediation actions like Systems Manager automation documentation. Our recommendations is to define a remediation plan based on your use case, preferably using Systems Manager Automation to automatically correct the issue. In our example (Figure 1) AWS Config receives the feedback and invokes AWS Systems Manager Automation to take the right action (7). You can invoke your CI/CD pipeline to update CloudFormation stacks with the needed tags to avoid stack drift. Alternatively, you could notify users/account owners via an Amazon SNS topic to take action (Run an automation with approvers). Here is an example on how to use AWS Systems Manager Automation runbooks to resolve operational tasks. You can find additional runbook examples here

Benefits

AWS Config offers a built-in rule called required-tags. However, this rule does not support all possible resources. Using the solution outlined in this blog post, you can validate a broader set of resources. For more details on supported resources, please, review Resource types you can use with AWS Resource Groups and Tag Editor.

As an additional benefit, you’ll get an easy-to-use AWS Config dashboard. Admins can use it to quickly visualize any kind of rule violation and take the required action.

Conclusion

You have seen how easy it is to get started with building a tagging dictionary and cost allocation strategy to provide management and operations team with important data about cost utilization that they can use in making decisions. As a next step, you may consider finding and correcting additional non-compliance resources in your AWS Organizations with AWS Config custom rules to invoke automated remediation action with AWS Systems Manager to ensure compliance based on the results of the rule evaluation.

We recommend that you start simple with your tagging strategy. You can start with a small number of areas that you want to monitor and can get more granular as you grow. Finally, we recommend reading Measuring tagging effectiveness and driving improvements to help with your tagging strategy.

Mohamed Othman

Mo joined AWS in 2020 as a Technical Account Manager, bringing with him 7 years of hands-on AWS DevOps experience and 6 year as systems operation admin. He is a member of two Technical Field Communities in AWS (Cloud Operation and Builder Experience), focusing on supporting customers with centralized operations management, CI/CD pipelines, and AI for DevSecOps.

Gabor Schulz

Gabor is a Senior Technical Account Manager at Amazon Web Services. He has more than 14 years of experience as a software engineer and with developing solutions at scale. He’s passionate about cost efficient infrastructure, operations and developing in Python.

Cedric Vogel

Cedric is Senior Business Development Manager in the AWS Cloud Economics – Cloud Financial Management (CFM) practice focusing on enabling customers to accelerate and maximize the value of the cloud by implementing CFM/FinOps capabilities. Prior to joining AWS, Cedric was a chartered accountant and spent most of his career in consulting helping customers to drive value creation across their Business and Finance functions.