AWS Cloud Operations Blog
Maximizing resource tagging at scale and across teams for your migration to AWS
Many customers are migrating to AWS to leverage cost reduction, boost staff productivity, improve operational resilience, and increase business agility. When your business decides to migrate to AWS, there are many areas that need careful attention and planning. It’s important to consider these areas across technical, business, and delivery domains.
A key area that is often overlooked when planning a migration is resource tagging. Having a good tagging strategy is key to operating effectively at scale and long-term on cloud. Tagging resources at scale is required to perform important infrastructure management and financial activities such as resource organization, cost allocation, automation, and access control.
In an ideal world, you’ll have a comprehensive tagging strategy in place before starting your migration. This would typically include clear governance, a standardized tag taxonomy, and automated enforcement mechanisms through AWS Organizations Service Control Policies (SCPs), tag policies, or AWS Config rules. But in reality, some customers have not had all of this in place when first migrating to AWS.
This post covers common pitfalls and best practices for maximizing your resource tagging on AWS at scale. We cover tagging across different teams and infrastructure provisioning operations and tools. This guidance can be used even if you don’t currently have a comprehensive tagging strategy in place.
Common pitfalls and challenges
In working with thousands of customers migrating to AWS, we’ve identified the following common pitfalls regarding tagging. These can lead to tagging gaps (i.e. untagged resources) or incorrect tagging of resources.
- Tagging requirements are not known by all teams. Migrations are a team effort, often with multiple distributed teams (nationally or internationally). Mandatory tag key-values aren’t always shared with all of the teams involved for use across all environments (sandbox environments, development, test, staging, production, etc.).
- Not adhering to the defined tagging taxonomy. AWS tag keys and values are case-sensitive. It’s important to watch out for spelling and capitalization of tags to avoid typos and inconsistencies. This becomes increasingly important when multiple teams are involved and proactive/reactive governance mechanisms aren’t in place yet.
- Inconsistent tagging caused by the use of different infrastructure provisioning operations. Different teams typically use different operations and tools to launch AWS resources, with different levels of tagging coverage: AWS Management Console, AWS Command Line Interface (AWS CLI), AWS CloudFormation, Hashicorp Terraform, AWS Cloud Development Kit (AWS CDK), AWS Software Development Kits (AWS SDKs), AWS Serverless Application Model (AWS SAM), CI/CD pipelines, etc.
- Tagging of ‘core’ infrastructure but not ‘complementary’ resources. A base level of tagging coverage is typical on resources like Amazon Elastic Compute Cloud (Amazon EC2) instances or Amazon Relational Database Service (Amazon RDS) DB instances. However, tagging of ‘complementary’ infrastructure such as EC2 AMIs, Amazon Elastic Block Store (Amazon EBS) volumes, EC2 NAT Gateways, or RDS DB snapshots tends to be missed more frequently.
- Not using AWS native tag propagation features. Many AWS services have an option to propagate (copy) tags from a ‘parent’ resource to a ‘child’ resource (e.g., Amazon EC2 Auto Scaling group tags copied to launched EC2 instances). This is a recommended feature to use, but it’s sometimes missed. See the best practices section of this post for more information on how to leverage this feature.
- Losing tags due to infrastructure drift. Using Infrastructure as Code (IaC) is a best practice when deploying and managing infrastructure on AWS at scale. Infrastructure state drift may be caused if a resource is launched using IaC and later tagged using a different tool (e.g. the AWS Management Console, AWS CLI, AWS Tag Editor, etc.). Conversely, issues may also arise if tags are removed inadvertently during state drift remediation.
How to maximize resource tagging coverage: Best practices
The following section highlights recommendations (technical and organizational) to maximize AWS resource tagging for your migration. These can be used even if you don’t have a comprehensive tagging strategy in place at this time.
Technical best practice: Tag consistently across different infrastructure provisioning operations
Resource creation operation (tool) | Infrastructure as Code (IaC)? | Tagging mechanism |
CloudFormation | Yes |
|
Terraform | Yes |
|
AWS CDK | Yes* |
|
AWS SAM | Yes* |
|
Manual (AWS Management Console) | No |
|
AWS CLI | No |
|
AWS SDKs | No |
|
AWS Cloud Control API | No |
|
* Enabled by CloudFormation.
Technical best practice: Leverage AWS native tag propagation features
Many popular AWS services include tag propagation features, where a ‘parent’ resource propagates or copies tags to ‘child’ resources. This feature is typically enabled easily as a 1-time operation, yet it’s a powerful way to increase tagging coverage in an automated way. Some examples of this feature include:
- Amazon EC2 Auto Scaling group tags copied to new instances launched (link).
- Amazon RDS DB instance tags copied to instance snapshots (link).
- Amazon Elastic Container Service (Amazon ECS) task definition or service tags copied to new tasks launched (link).
- Amazon Elastic MapReduce (Amazon EMR) cluster tags propagated to new instances launched (link), which is enabled by default.
- Amazon ElastiCache tagging of replication groups will be propagated to all clusters in the replication group (link).
Make sure that you’re enabling this mechanism whenever possible to tag your resources at-scale. For those services where this feature isn’t yet supported, you can consider using AWS CloudTrail to propagate tags across related AWS resources.
Organizational best practice: Coordinate tagging activities at-scale across teams
To address the common pitfalls and challenges listed previously, you can consider the following areas:
- Ownership: Identify one or more owners for tagging within your migration program. They will be responsible for communicating mandatory tag key-values, sharing technical guidance, and measuring tagging levels throughout the migration.
- Communication: Cascade down mandatory tag requirements with all of the individuals involved in the migration, and across different geographies and functions (development, test, operations, etc.).
- Enablement: Share technical guidance with all of the teams around tagging, based on their chosen infrastructure deployment methods (you can use the guidance in this post).
- Tracking: Monitor tagging coverage levels regularly across all of the AWS accounts within your AWS Organization. You can use the tag filter within AWS Cost Explorer to identify untagged AWS accounts and services, and communicate these to the respective owners for remediation. This process should be performed regularly (at least once a month).
- Proactive tagging and governance guardrails: Tagging at resource creation, ideally via IaC, helps reduce potential future efforts and rework in remediating missed tags. Utilizing tools such as AWS Organizations SCPs and Tag Policies can also help make sure that the correct tags are being applied. These enforcement mechanisms are a great starting point for building a robust tagging governance.
Conclusion
A good tagging strategy includes strong proactive and reactive governance. This is key for operating efficiently at scale on the cloud, including resource organization, cost allocation, automation, and access control. However, in reality, not all customers have this in place at the time of migrating to AWS. This post covers some common pitfalls and best practices to make sure that effective tagging of resources is completed at scale when migrating to AWS. This can be done before a robust tagging strategy is in place. The post offers technical guidance around tagging using the most common methods and tools of infrastructure provisioning. It also describes how to leverage AWS native tagging propagation features, and organizational best practices to address tagging at scale for your migration.
About the authors: