Skip to main content

Guidance for Self-Healing Code on AWS

Overview

This Guidance helps software companies set up a system to detect error logs, generate bug fixes, and create pull requests. Any company that creates software inevitably has to balance addressing bugs while also competing with product and feature development pressure. Bugs can distract developers' focus, degrade the user experience, and cause misleading metrics. This Guidance helps software companies implement an automated system that detects and fixes bugs to enhance application reliability and improve the overall customer experience.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

The CloudWatch logs subscription pre-filters application error logs and automatically invokes a Lambda function for further processing to remediate the error. CloudWatch and Lambda can help with the automatic detection and triaging of application errors.

Read the Operational Excellence whitepaper 

Secrets are stored in Parameter Store with encryption enabled and can only be read by users or roles with explicit permissions. By storing secrets in Parameter Store, you can create fine-grained permissions per parameter. All IAM policies are scoped down to minimum permissions required for the services and integrations. Encryption on the parameters also aids in obscuring the values of the secrets in the AWS console. Scoped IAM policies help ensure that the blast radius of each IAM role is scoped to its bare minimum.

Read the Security whitepaper 

Lambda, DynamoDB, and Amazon SQS are managed services which offer automatic scalability and reliability across multiple Availability Zones (AZs). Lambda offers serverless computing for code focus and provides a stateless compute layer. DynamoDB is a fully managed NoSQL database with backups and replication and offers native backup and restore capabilities. Amazon SQS helps ensure message delivery in distributed systems through loose coupling between services, which reduces chances for system failure.

Read the Reliability whitepaper 

Lambda is highly scalable and can enable parallel processing of items. DynamoDB has a stateless API layer and shared storage layer which allows virtually limitless storage and throughput. DynamoDB Streams also allows efficient event-driven processing of item updates for downstream constructs. Amazon Bedrock handles all infrastructure management and scaling of models.

The combination of DynamoDB, DynamoDB Streams, Amazon Bedrock, and Lambda allows you to efficiently scale your database, react to data changes in near real time, and process events on-demand without the overhead of server management. This is particularly important for this system, where the rate of invocations can be inconsistent or erratic.

Read the Performance Efficiency whitepaper 

Lambda, DynamoDB, Amazon SQS, and Amazon Bedrock are all serverless services which are charged on usage, rather than incurring static costs. This system will potentially have an inconsistent rate of invocations, with frequent periods of no activity. Serverless services allow efficient use of on-demand resources and only generate costs during invocation of the system.

Read the Cost Optimization whitepaper 

CloudWatch, Lambda, DynamoDB, Amazon SQS, and Amazon Bedrock are all serverless services which do not require statically provisioned servers and do not consume resources during periods of inactivity. Serverless services allow efficient use of on-demand resources, which get de-provisioned automatically when no longer used to reduce your overall resource footprint.

Read the Sustainability whitepaper 

Implementation resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open implementation guide   Open sample code on GitHub

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.