AWS Cloud Operations & Migrations Blog
How to aggregate and visualize AWS Health events using AWS Organizations and Amazon Elasticsearch Service
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
In this post, I show you how to aggregate AWS Health events centrally from all accounts in your organization using AWS Organizations, AWS Lambda, and AWS Health API, and then build automation to ingest and visualize the operations data using Amazon Elasticsearch Service.
As your organization grows, the number of operational emails and AWS Personal Health Dashboard notifications you must review also increases. You might be wondering how you can review these notifications across hundreds of accounts, filter the relevant ones and send them to the right teams, and visualize and report on what is happening in your day-to-day operations.
The new AWS Health Organizational View provides centralized and real-time access to all AWS Health events, including operational issues, scheduled maintenance, and account notifications, posted to individual accounts in your organization. Using the AWS Health API for Organizational View, you can aggregate all your AWS Health events from multiple accounts, these can be ingested into Amazon Elasticsearch Service to visualize the data based on accounts, business units, service, AWS Region, etc.
The following architecture diagram outlines an overview of the solution.
Figure 1: Architecture overview
Here I used Lambda to aggregate the AWS Health events across Organizations and ingest the response JSON to Amazon Elasticsearch Service. After the data is ingested, you can access it securely using an NGINX proxy and Amazon Cognito authentication for visualization using Kibana. In this secure architecture, the proxy is in the public subnet and Lambda and Amazon Elasticsearch Service with Kibana are deployed in the private subnet. The right side of the architecture diagram shows the existing AWS Organizations accounts. Amazon EventBridge is used to trigger the Lambda function at a fixed interval.
Before you can use the AWS Health API operations for Organizational View, you must:
- Satisfy the prerequisites in the AWS Health User Guide.
- Enable AWS Health to work with AWS Organizations in US East (N. Virginia) Region. To do this, call the EnableHealthServiceAccessForOrganization operation from the management account or from an account that can assume the role with the required permissions.
- Have permissions to create a service-linked role for Amazon Elasticsearch Service.
Deploying the stack
I used AWS CloudFormation to deploy the architecture. The EventBridge rule triggers the Lambda function every 15 minutes to ingest data to Amazon Elasticsearch Service. Depending on your requirements, you can omit or disable the rule later. Although the Lambda function is triggered every 15 minutes, it cannot ingest the data right after the deployment. Later in the post, I explain the Kibana setup.
The following diagram shows the Amazon Elasticsearch Service architecture that is deployed with the CloudFormation stack. This gives you a secure public Access Point backed by Amazon Cognito and NGINX.
Figure 2: Amazon Elasticsearch Service secure deployment architecture with a public proxy and private elastic network interfaces
This solution deploys public and private subnets across three Availability Zones, but keeps most of the internet traffic (private and public access) to a minimum. Access to the internet is limited from instances in the VPC by using a NAT gateway. The right side of the diagram refers to service VPCs. For security and service stability, most of the AWS services are deployed in a service VPC, which is managed by the service team, for additional layers of isolation.
You can deploy the stack in any AWS Region where the services EventBridge, Amazon Cognito, Amazon Elasticsearch Service, and Lambda are available.
To launch the CloudFormation stack, choose Launch Stack below.
1. On Specify stack details, enter a name for the stack (for example, Health-agg), and then choose Next.
Figure 3: Enter a stack name for the CloudFormation template to be deployed
2. In Stack creation options, for Rollback on failure, choose Disable. You want to preserve errors so you can debug any issues. Choose Next.
Figure 4: Disable rollback on failure under Stack creation options
3. Keep the other fields at their defaults, scroll to the bottom of the page, and in the Capabilities section, select the two check boxes. Choose Create stack.
Figure 5: Acknowledge IAM resource creation and capabilities
It takes approximately 30 to 35 minutes to create the stack.
4. On the Stacks page, choose the root stack (health-agg), copy the value for
CognitoPassword from the Outputs section, and paste them in a text file for future use.
Figure 6: Deployed root stack with nested stacks
Set up Kibana
1. Copy the
KibanaProxyURL from the text file and paste it in your web browser for accessing Kibana. You have a screen similar to the one shown here. I am using Firefox.
Figure 7: Navigate to Kibana dashboard using the web browser
Since you are using self-signed certificates, the authority that is responsible for public validation cannot be verified. That is ok, click Advanced button, and the following screen will appear. Click on Accept the Risk and Continue button.
Figure 7.1: Navigate to Kibana dashboard using the Advanced settings
2. On the sign-in page, use the credentials (
CognitoPassword) from the text file to sign-in.
Figure 8: Sign in with Kibana credentials
Note: If you cannot reach the sign-in page or if you are getting 502 or 504 errors, the NGINX server (proxy) might have stopped. Open the AWS Systems Manager console and start a new session with the proxy instance. Run the following commands and then retry.
sh-4.2 # sudo su - root [root@ip-10-1-0-xxx ~]# cd /etc/nginx/conf.d/ [root@ip-10-1-0-xxx conf.d]# service NGINX restart
3. Change your password.
Figure 9: Choose a new password
4. On the Welcome to Elastic Kibana page, choose Explore on my own.
Figure 10: Welcome to Elastic Kibana
5. From the Kibana dashboard, choose Dev Tools.
Figure 11: Choose Dev Tools to create index mapping
6. A mapping must be defined to ensure that the aggregated Health events data maps properly on ingestion. Before you create the mapping, look at this massaged JSON obtained by combining the outputs of the following Organizational View APIs:
- DescribeEventsForOrganization: Summary information about events across the organization.
- DescribeAffectedAccountsForOrganization: List of accounts in your organization impacted by an event.
- DescribeEventDetailsForOrganization: Detailed information about events in your organization.
- DescribeAffectedEntitiesForOrganization: Information about AWS resources in your organization that are affected by events.
7. Paste this index mapping in the Dev Tools console, and then choose the green button in the upper right corner.
Figure 12: Create index mapping
You should receive a confirmation that says
"acknowledged":true. This confirms the mapping.
Because fine-grained access is enabled on the cluster, the Lambda function that ingests data to the cluster needs access. Add the IAM role for the Lambda function to the backend roles in Kibana.
8. In the CloudFormation console, choose the Lambda stack, and from Outputs, copy the
Figure 13: Copy the ESLambdaExecutionRole ARN from the CloudFormation console
9. Choose the Amazon Cognito stack, and copy the
Figure 14: Copy AuthRoleARN from CloudFormation console
10. Go back to Kibana console, and choose Dev Tools
Figure 15: Navigate back to Dev Tools
11. To update the backend roles, scroll down and execute the following API call. Ideally, the Amazon Cognito role (
cog_auth_role) would not be needed here because it’s already been added, but the current Amazon Elasticsearch Service API calls do not have the option to only add a new backend role. For that reason, both roles are required here. For more information, see REST API differences in the Amazon Elasticsearch Service Developer Guide and Role mappings in the Open Distro for Elasticsearch documentation.
Figure 16: Update Kibana backend roles
Now data is ingested into Amazon Elasticsearch Service every time the Lambda function is triggered (every 15 minutes). The ingest Lambda function also handles Health event updates for long-running events based on lastUpdatedTime.
Note: If you don’t have the data within the past 15 minutes or want to change the frequency, open the EventBridge console, choose the rule (
TriggerPHDLambda), and edit the schedule as required. You must update the Lambda code too, because EventBridge doesn’t pass the interval. Open the Lambda console, choose the
getAggregateHealthEvents function, and edit the
TRIGGERMINUTES environment variable. You can even set hours and days (must be converted to minutes). Also, if you choose to pull many events, make sure to increase the Lambda function timeout, because the AWS Health API for Organizational View allows only one transaction per second. The Lambda code already handles this back off behavior using a sleep interval.
If there are no AWS Health events for the specified interval, you may not see the data in Amazon Elasticsearch Service. The best way to verify the flow is by making sure the Amazon CloudWatch Logs for the Lambda functions (
ingestToESFunction) are created.
This diagram shows how I’m calling the AWS Health API for Organizational View from the Lambda function.
Figure 17: Lambda calls to AWS Health API for Organizational View
12. Go back to the Kibana console and choose Discover.
Figure 18: Choose Discover to create an index pattern
The Create index pattern page is displayed.
Figure 19: Create index pattern page
13. I am using the same event-phd index pattern. You can change it according to your mapping. Choose Next step.
Figure 20: Provide index pattern name
14. You can set the time filter on one of the time fields. In Time Filter field name, I entered
startTime. You can choose other time field names (for example,
lastUpdatedTime). Choose Create index pattern.
Figure 21: Select startTime as time filter field name
After your index pattern has been created, you should see the created index with all the field’s.
Figure 22: Index created successfully
15. Choose Discover to look at the ingested data.
Figure 23: Choose Discover to view ingested data
16. From the calendar dropdown list, set the timeframe for looking at the ingested data. I chose 90 days from the calendar dropdown.
Figure 24: Select timeframe to view ingested data
You should now see your ingested data along with the histogram timeline view.
Figure 25: View ingested data with the histogram view
Visualize the data
Now it’s time to analyze this data and find insights by using visualization. I am going to create few visualizations.
To create a pie chart visualization:
- On the Kibana home page, choose Visualize.
- Choose the plus sign to create a new visualization.
- Under Basic Charts, choose the pie chart visualization.
- Select the source as event-phd (index)
- In the upper right corner, change the time interval to reflect the interval for which you want to visualize the data.
- Under Metrics, for Slice size aggregation use Unique count.
- Under Field, use
- Under Buckets, click Add and choose Split slices. For Aggregation, choose Terms. For Field, choose
service.keyword. Leave Order by, Order, and Size at their defaults.
- Select Group other values in separate bucket.
Figure 26: Pie chart visualization settings
- For Label for other bucket, use Other.
- Repeat steps 8 through 10, this time with Field set to
Figure 27: Pie chart split slices by AccountID
- On the Options tab, select Show labels, and then choose Update.
Figure 28: Pie chart view
In the upper left corner, choose Save and enter a title and description to this visualization. This visualization provides insights on respective service notifications based on the account. I am creating another visualization here to show the time line and count of the events.
To create a Timelion visualization:
- On the Kibana home page, choose Visualize.
- Choose the plus sign to create a visualization.
- Under Basic Charts, choose the Timelion type visualization.
- On the upper right corner, change the time interval to the interval for which you want to visualize the data.
- From the dropdown list, choose an interval that works for you. I chose 1 day for the interval.
- Under Timelion expression, enter the following query expression to get the data for the top 10 services, and then choose Update.
Figure 29: Timelion visualization query and settings
The chart should like the following.
Figure 30: Timelion visualization
In the upper left corner, choose Save and enter a title and description for this visualization. You can create visualizations, share them across teams and business units, and use them for metrics and reporting purposes. You can also create a dashboard by adding all of the preceding visualizations.
I added the preceding visualizations and few more to a dashboard as shown here.
Figure 31: Dashboard view with multiple visualizations.
You can create the dashboard shown in Figure 31 by downloading this file and following these steps:
- From the left pane, choose Management.
- Choose Saved Objects, and then choose Import.
- Choose the downloaded file, clear the Automatically overwrite all saved objects check box, and then choose Import.
- If you already have the same index (event-phd) or visualizations and dashboard names as mentioned in the imported file, a pop-up message will ask if you want to override the index, visualizations, and dashboards. Choose Overwrite or Cancel based on your preference.
When you choose the dashboard name, the dashboard is displayed along with all the visualizations in the imported file.
Note: If EventBridge, Lambda, Amazon Cognito, Amazon Elasticsearch Service, or any related services are down, this solution won’t work during the specified timeframe. You can increase the duration of the Lambda function polling to around 10-12 hours because most large-scale events (LSEs) do not last longer than that. This way, all the events can be captured even when the services are down. The ingest Lambda function has the capability to update a document if it already exists, hence data duplication doesn’t occur.
When you have finished visualizing data, delete the CloudFormation stacks to clean up all the AWS resources that you created. Make sure to empty the LambdaZipsBucket (created in the Lambda stack) and then delete the root CloudFormation stack (Health-agg), which, in turn, deletes all the child stacks. Also, delete the CloudWatch log groups manually for the Lambda functions as they won’t be deleted automatically when you delete the CloudFormation stacks.
In this post, I showed you how you can use AWS organizations to aggregate all AWS Health events centrally across your accounts. You deployed a serverless infrastructure through CloudFormation that periodically ingests data to Amazon Elasticsearch Service and then created visualizations. You can now view and query data in one place and generate and share operational insights for your entire organization across teams and business units.
You can create different visualizations and dashboards for different teams/business units based on the services that matter most to them, and you can use the Amazon Elasticsearch Service anomaly detection feature to detect and notify teams of anomalies such as large-scale events in progress or increased latencies/errors for a service. You can also summarize scheduled maintenances, operational issues, and upcoming changes and send a report to your teams weekly or biweekly, keeping them informed and operationally compliant.
If you’d like to send organizational AWS Health events to Amazon Chime or Slack, check out the Send Organizational AWS Health Events to Amazon Chime or Slack blog post, which leverages the AWS Health Organizational View Alerts (AHOVA) notification tool for sending alerts.
About the Author
Srinivasa Atta is a Sr. Technical Account Manager at Amazon Web Services (AWS). At AWS, Srini works with enterprise customers to design, deploy, and manage their cloud architectures and strategies. Srini has over 12 years of experience in information technology, including roles in software development, infrastructure, leadership, and architecture.