This Guidance demonstrates observability in applications to get deeper insights from application stacks and infrastructure metrics. To improve resiliency across two AWS Regions, it is essential to monitor application and infrastructure components across the entire stack.

Architecture Diagram

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • Deep application observability (DAO) ensures that application observability is carried at every layer of your workload: infrastructure, application, and business metrics. What you monitor depends on your organizational KPIs and SLAs. It helps customers prepare for potential service degradations and/or region-level failures and operate with efficiency and automation where applicable. As customers get more familiar with key metrics related to their application, they can evolve further by potentially incorporating automated systems with other existing SOPs to handle a full regional failure as needed (often as an audit/compliance requirement).

    Read the Operational Excellence whitepaper 
  • All logs are encrypted at rest using AWS Key Management Service (AWS KMS). Access to the dashboards and any automated tasks running as a result of alarms will practice the principle of least privilege and only have the appropriate policies attached to their roles. Moreover, changing alarm thresholds, automated tasks, and other actions should be done by the appropriate personnel only. Changes should go through a change review process to ensure that business SLAs are always respected, and infrastructure metrics are leveraged to ensure business goals are met.

    Read the Security whitepaper 
  • DAO guidance aligns with the Reliability pillar by advocating for automatic recovery from failure using proactive observability. If a regional failover is required, it can be initiated manually or automatically. DAO also emphasizes the need to monitor business SLAs to ensure infrastructure capacity is optimized and if those SLAs are not met, appropriate alarms are tripped. The guidance further encourages regional failover to be tested regularly to ensure all failure pathways are discovered and thus reducing business risk.

    Read the Reliability whitepaper 
  • DAO encourages mechanical sympathy by recommending customers to monitor application workloads using the right tool, such as X-Ray for Lambda.  DAO provides guidance on leveraging advanced technologies, such as CloudWatch Synthetics and canary testing, to ensure workload performance is measured through multiple dimensions.

    Read the Performance Efficiency whitepaper 
  • DAO guidance leverages CloudWatch metrics, alarms, and logs coupled with application-level tracing like X-Ray. Most of the guidance implementation will remain with the AWS Free Tier boundaries of CloudWatch and X-Ray, although as customer requirements vary, the cost aspect will need to be considered. For example, older CloudWatch logs can be pushed to Amazon Simple Storage Service (Amazon S3) to reduce costs further.

    Read the Cost Optimization whitepaper 
  • The DAO guidance recommends that you monitor all layers of your workload to ensure that business SLAs are continuously met, and that you conduct a regional failover when degradation or failure occurs. DAO can also be used to ensure efficient use of resources and reduce over provisioning of infrastructure to ensure a sustainable long-term working environment. Moreover, because the secondary environment is in a passive state, we recommend the resources to be scaled down until they are needed in case of a regional failover.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin. 

[Subject]
[Content Type]

[Title]

[Subtitle]
This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?