AWS Health Aware – Customize AWS Health Alerts for Organizational and Personal AWS Accounts
AWS strives for high availability and has a 99.9% uptime for most services. However, in the rare event that incidents do occur, customers should be prepared to respond. AWS Health is the primary channel to communicate service degradation, scheduled changes, and resource impacting issues. For customers running critical applications, having access to proactive and real-time alerts are key aspects to improve their overall incident remediation processes and maintain operational excellence. Speed and agility are crucial for our customers when it comes to monitoring health events and maintaining the reliability and availability of customer’s applications running on AWS.
AWS Health Aware – AHA is an incident management & communication framework to ingest proactive and real-time alerts from AWS Health to a customer’s preferred communication channels. Customers using AWS Organizations can get aggregated active account level alerts from impacted accounts across their organization. Alerts can be configured to endpoint(s) such as Slack, Microsoft Teams, Amazon Chime and Email Alerts. AHA can also be integrated with a broad range of other endpoints during configuration. These alerts are targeted to give customers event visibility and guidance to help quickly diagnose and resolve issues that are impacting our customer’s applications or workloads.
Benefits of AHA
These customers can take advantage of the following features:
- Integration with communication platforms such as Slack, Amazon Chime, Microsoft Teams and Email for automated and real time alerts of the AWS incidents.
- Integration with Amazon EventBridge, with capability to ingest both Organizational and Non-organizational alerts to an event bus. This help customers integrate with more than 35 SaaS partners such as NewRelic/DataDog/PagerDuty etc.
- Aggregated PHD alerts with prescriptive guidance provided by AWS Health.
- Visibility into the AWS accounts and resources that are impacted for PHD alerts.
- Ability to filter out unwanted alerts by selecting specific region(s).
Before we get into the deployment, let’s look at the features and architecture of AHA to better understand how it all works together. In Figure 1, the available AWS Health API events are outlined based on whether the customer is utilizing AWS Organizations or not.
The diagrams below presents the architecture of AHA that models a serverless single/multi-region setup.
A user uploads an AWS CloudFormation Template (CFT) to an AWS account. The CFT grabs the solution from an Amazon S3 Bucket. The CFT then deploys Amazon IAM Roles, an Amazon EventBridge Schedule, AWS Lambda function, webhook URLs in AWS Secrets Manager and an Amazon DynamoDB Table.
This architecture enables users to deploy AHA on a given single AWS region.
With this architecture, users can deploy AHA in an active-active region model. This centralizes the AHA deployment in multiple regions to remain responsive and prevent any disruptions that may occur during AWS events.
In this table, we list the resources from the AHA architecture and their purpose.
|DynamoDDBTable||DynamoDB Table used to store Event ARNs, updates and TTL|
|ChimeChannelSecret||Webhook URL for Amazon Chime stored in AWS Secrets Manager|
|EventBusNameSecret||EventBus ARN for Amazon EventBridge stored in AWS Secrets Manager|
|LambdaExecutionRole||IAM role used for LambdaFunction|
|LambdaFunction||Main Lambda function that reads from AWS Health API, sends to configured webhook URLs and writes to DynamoDB|
|LambdaSchedule||Amazon EventBridge rule that runs every min to invoke LambdaFunction|
|LambdaSchedulePermission||IAM Role used for LambdaSchedule|
|MicrosoftChannelSecret||Webhook URL for Microsoft Teams stored in AWS Secrets Manager|
|SlackChannelSecret||Webhook URL for Slack stored in AWS Secrets Manager|
Amazon EventBridge makes it easy to connect applications together by delivering a stream of real-time data from custom sources, Amazon Web Services (AWS), and software-as-a-service (SaaS) applications.
The data can then be sent to a variety of targets like AWS Lambda, AWS Step Functions, Amazon Kinesis, and many more.
Amazon EventBridge also enables you to connect your applications with a range of SaaS partners without having to worry about building and maintaining custom infrastructure. Customers are using these capabilities to improve the scalability and reliability of their applications by building event-driven architectures rather than tight service coupling.
As more customers have started building end-to-end integrations with EventBridge, AHA would like to provide guidance for customers regarding the best way to get started with the array of different possibilities for event sources and event types.
These use-cases include:
- Auditing in near real-time, and archival of historical business operations events.
- Visualization and analysis of events for business intelligence and operational purposes.
- Automated alerting and remediation of application, service, and infrastructure systems.
- Connecting custom workflows with downstream consumers and legacy or on-premises applications.
The visualization depicted in Figure 3 explains the extensibility options of AHA through AWS EventBridge Integrations:
Configuring an endpoint
AHA can send to multiple endpoints (webhook URLs, Email or EventBridge). To use any of these you’ll need to set it up before-hand as some of these are done on 3rd party websites. We’ll go over some of the common ones here.
- Creating an Amazon Chime Webhook URL (permissions required; Amazon Chime room access, ability to manage webhooks)
- Create a new chat room for events (i.e. aws_events)
- In the chat room created in step 1, on the gear icon and click manage webhooks and bots
- Click Add webhook
- Type a name for the bot (i.e. AHA) and click Create
- Click Copy URL, we will need it for the deployment
- Creating a Slack Webhook URL (permissions required; add a new channel and app in Slack)
- Create a new channel for events (i.e. aws_events)
- In your browser go to: workspace-name.slack.com/apps where workspace-name is the name of your Slack Workspace
- In the search bar, search for: Incoming Webhooks and click on it
- Click on Add to Slack
- From the drop down click on the channel you created in step 1 and click Add Incoming Webhooks integration
- From this page you can change the name of the webhook (i.e. AWS Bot), the icon/emoji to use, etc.
- For the deployment we will need the Webhook URL
- Creating a Microsoft Teams Webhook URL (permissions required- add a new channel and app in Microsoft Teams)
- Create a new channel for events (i.e. aws_events)
- Within your Microsoft Team go to Apps
- In the search bar, search for: Incoming Webhook and click on it
- Click on Add to team
- Type in the name of your on the channel your created in step 1 and click Set up a connector
- From this page you can change the name of the webhook (i.e. AWS Bot), the icon/emoji to use, etc. Click Create when done
- For the deployment we will need the webhook URL that is presented
- Configuring an Email
- You’ll be able to send email alerts to one or many addresses. However, you must first verify the email(s) in the Simple Email Service (SES) console.
- AHA utilizes Amazon Simple Email Service (SES) so all you need is to enter in a To: address and a From: address.
- You may have to allow a rule in your environment so that the emails don’t get labeled as spam from your email client.
- Creating a Amazon EventBridge – EventBus
- Open the Amazon EventBridge console at https://console.aws.amazon.com/events/
- In the navigation pane, choose Event buses
- Choose Create event bus
- Enter a name for the new event bus
- Choose Create
For more information, please refer to the documentation.
Detailed deployment steps are available at AHA Github.
We are happy to announce the launch of new enhancements to AHA. Please try them out and keep sending us your feedback!
- Terraform Deployment option – Beta (Available for Single Region, Multi Region)
- Multi-region deployment – Available for all types of deployments (With Organization, Without Organization)
- Ability to filter accounts (Refer to AccountIDs CFN parameter for more info on how to exclude accounts from AHA notifications)
- Ability to view Account Names for a given Account ID in the PHD alerts
- If you are running AHA with the Non-Org mode, AHA will send the Account #’ and resource(s) impacts if applicable for a given alert
- Ability to deploy AHA with the Org mode on a member account
- Support for a new Health Event Type – “Investigation”
Support & Contributions
AWS Health Aware solution is available in Github AWS Samples repository. The builders of this solution are able to help with AHA questions or feature requests on a BEST EFFORTS basis ONLY.
Enterprise Support customers can reach out to their TAM’s on any further questions or feature requests. Any contributions to this project within Github are welcome and can be requested via Github pull requests!
In this post you learned how the AWS Health API can be used to alert customers with up to date information about AWS Health events affecting them. You deployed a serverless infrastructure via AWS CloudFormation that sends those alerts to your preferred communication channel(s). You should now be able to proactively monitor and react to AWS Health events for your personal and/or AWS Organizations account(s). To get started, visit the aws-samples Github repository and download AWS Health Aware (AHA).
About the Authors
Mridula Grandhi is a Principal Technical Account Manager for AWS providing customers guidance on business-technology alignment and supporting re-invention of their cloud operation models and processes. Mridula is also a Containers specialist and works with AWS customers to design, deploy, and manage their AWS workloads/architectures. You can reach her on Twitter via @gmridula1 (DMs are open).
Jordan Roth is a Senior Solution Architect specializing in VMC and Hybrid-Edge for AWS. Jordan assists AWS customers and partners with their cloud migration strategies. In his spare time, he enjoys traveling the globe with his wife, cooking, completing escape rooms, and running around with his two dogs.