AWS Cloud Operations Blog

Continuously optimize your operational excellence posture through AWS Trusted Advisor

AWS Trusted Advisor continuously evaluates your AWS environment using best practice checks in the categories of cost optimization, performance, resilience, security, service limits, and operational excellence and recommends actions to remediate any deviations from AWS best practices in the AWS Well-Architected Framework. AWS Well-Architected Framework is a collection of architectural best practices and guidance to help customers design, and build cloud workloads effectively.

On 26th October 2023, Trusted Advisor added a new operational excellence check category and integrated with AWS Config to deliver 64 new best practice checks across all categories. This launch will improve the operational readiness of your AWS environment, increasing the coverage of Trusted Advisor checks, and the alignment with AWS Well-Architected Framework best practices.

In this blog post, we will explore the new Operational Excellence category, and walk you through a sample scenario to demonstrate how Trusted Advisor can help you identify operational risks and optimization opportunities.

AWS Trusted Advisor in alignment with AWS Well-Architected Best Practices

As your business and AWS environment evolves, ensuring that you have the right operational capabilities is key to being able to scale and keep up with the ongoing changes required to stay ahead in the competitive landscape. This is why the AWS Well-Architected Framework placed a specific emphasis on operations related best practices documented in the Operational Excellence pillar.

One of the focus areas in the Operational Excellence pillar includes the best practices needed to ensure you are well prepared to operate the workload environment. For example, OPS05-BP05 Perform patch management, which advises you prepare your cloud environment with the right set of capabilities to perform patch management at scale to reduce errors and operational overhead. For EC2 instances, the best practice is to integrate with automated service capabilities such as AWS Systems Manager Patch Manager and Change Manager.

As a pre-requisite to using Systems Manager automated capabilities, you need to ensure that EC2 instances have the AWS Systems Manager Agent package installed and registered correctly to the AWS Systems Manager Service.

In the following section of the blog, we will show you how you can use Trusted Advisor checks with AWS Config data source to detect if your EC2 Instances are managed by Systems Manager.

Operational Excellence with AWS Trusted Advisor

In this example you will learn how the recently introduced operational excellence checks can assist in examining your AWS environment using rules in AWS Config and subsequently provide recommendations when opportunities arise to enhance the efficiency of your workload’s operation.

Following is a detailed procedure to identify Amazon EC2 instances not managed by AWS Systems Manager through the integration of Trusted Advisor and AWS Config.

To extract the AWS Config rule name (console)

  1. In the Operational Excellence tab in Trusted Advisor, expand the check Amazon EC2 Instance Not Managed by AWS Systems Manager.
  2. In the Source section, copy the AWS Config Managed Rule name “ec2-instance-managed-by-systems-manager”.

Figure 1 – Example of an operational excellence check in AWS Trusted Advisor

To activate the corresponding AWS Config managed rule (console)

  1. Navigate to AWS Config in the AWS console. If you have not enabled AWS Config yet, refer to the Getting started documentation to enable AWS Config. Please note that you will be charged based on the usage of AWS Config. For more details, please refer to AWS Config Pricing.
  2. In the Rules page, choose Add rule.

Figure 2 – Example of adding a rule in AWS Config

  1. Choose Add AWS managed rule.
  2. Search for the AWS managed rule named “ec2-instance-managed-by-systems-manager” in the search bar, which will show the rule with relevant descriptions.
  3. Click the radio button left to the “ec2-instance-managed-by-systems-manager” rule name.
  4. Choose Next.

Figure 3 – Example of adding an AWS Config Managed Rule named ec2-instance-managed-by-systems-manager

  1. In the Configure rule page, leave the default setting as is, and choose Next.

Figure 4 – Example of AWS Config Managed Rule details on the Configure rule page

  1. In the Review and create page, ensure the AWS Config managed rule is the one needed. Once confirmed, save the rule, which will appear on the Rules overview page.

Figure 5 – ec2-instance-managed-by-systems-manager rule has been successfully added to AWS Config

Now you have set up the prerequisites to generate Trusted Advisor Operational Excellence recommendations for this specific check. After the AWS Config rule generates evaluation results, you will see the results in Trusted Advisor in near real-time.

Identify EC2 instances not managed by Systems Manager (console)

  1. In the Operational Excellence tab in Trusted Advisor, observe the “Investigation recommended” item.

    Figure 6 – Example of an investigation recommended under the Operational Excellence in Trusted Advisor

  2. Expand the Amazon EC2 Instance Not Managed by AWS Systems Manager check to observe findings. In this example, there are three EC2 instances highlighted which are not managed by Systems Manager.
  3. Observe the Resource, AWS Config Rule, and Input Parameters of these EC2 instances.

    Figure 7 – Example of a source, alert criteria, recommended action and additional resource details discovered in Trusted Advisor

Amazon EC2 Instance Not Managed by AWS Systems Manager check in Operational Excellence explains how you can efficiently manage Amazon EC2 instances by providing centralized management, automation, inventory, patch management, change management, and consistency in OS configurations.

Trusted Advisor also provides a Recommended Action to guide you in implementing operational best practices to enable organizations to manage a fleet of Amazon EC2 instances by reducing operational overhead. In this example, Trusted Advisor guides you through setting up Systems Manager for EC2 instances and troubleshoot when your EC2 instance is not shown in Systems Manager. This check aligns with best practices OPS05-BP03 Use configuration management systems  and OPS05-BP05 Perform patch management from the AWS Well-Architected Framework, which helps your operations team make repeatable and auditable configuration changes and reduce the level of effort. After AWS Systems Manager takes control of your EC2 instances, your operations team can enhance operational efficiency by utilizing automated patch management and establishing a consistent change control process using Systems Manager Patch Manager and Change Manager.

Similar to the above scenario, you can also enable other Trusted Advisor Operational Excellence checks by deploying the corresponding AWS Config managed rule associated with the checks. Once the AWS Config managed rule is activated, Trusted Advisor will continuously evaluate your AWS resources and flag if there are any resource configurations that deviate from the operational best practices. Based on the Recommended Action in each check, you can implement remediation actions to help you achieve Operational Excellence for your AWS environments.

Conclusion

Operational Excellence is essential to ensure that you are able to scale and keep up with the speed of the ongoing business changes. This blog post introduced the new Operational Excellence category in Trusted Advisor. We also share new checks from AWS Config data source. These checks are designed to assist you in improving the operational posture in your environment, aligning with AWS Well-Architected Operational Excellence Best Practices. To learn more about new Operational Excellence Trusted Advisor checks, visit our Trusted Advisor check reference.

About the authors:

Jang Whan Han

Jang Whan is a AWS Well-Architected GEO Solutions Architect who builds out example scenarios and hands-on labs to demonstrate AWS best practices for deploying workloads in the AWS cloud. He has spent the time dedicated to driving AWS best practices with AWS Partner Network (APN ) partners and AWS customers especially.

Jerry Chen

Jerry Chen is currently a Senior AWS Well-Architected GEO Solutions Architect at Amazon Web Services (AWS). He’s been focusing on cloud security and operational architecture design for AWS customers and partners.

You can follow Jerry on LinkedIn.