How to aggregate and visualize AWS Health events using AWS Organizations and Amazon Elasticsearch Service

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.

In this post, I show you how to aggregate AWS Health events centrally from all accounts in your organization using AWS Organizations, AWS Lambda, and AWS Health API, and then build automation to ingest and visualize the operations data using Amazon Elasticsearch Service.

As your organization grows, the number of operational emails and AWS Personal Health Dashboard notifications you must review also increases. You might be wondering how you can review these notifications across hundreds of accounts, filter the relevant ones and send them to the right teams, and visualize and report on what is happening in your day-to-day operations.

The new AWS Health Organizational View provides centralized and real-time access to all AWS Health events, including operational issues, scheduled maintenance, and account notifications, posted to individual accounts in your organization. Using the AWS Health API for Organizational View, you can aggregate all your AWS Health events from multiple accounts, these can be ingested into Amazon Elasticsearch Service to visualize the data based on accounts, business units, service, AWS Region, etc.

Solution overview

The following architecture diagram outlines an overview of the solution.

Figure 1: Architecture overview

Here I used Lambda to aggregate the AWS Health events across Organizations and ingest the response JSON to Amazon Elasticsearch Service. After the data is ingested, you can access it securely using an NGINX proxy and Amazon Cognito authentication for visualization using Kibana. In this secure architecture, the proxy is in the public subnet and Lambda and Amazon Elasticsearch Service with Kibana are deployed in the private subnet. The right side of the architecture diagram shows the existing AWS Organizations accounts. Amazon EventBridge is used to trigger the Lambda function at a fixed interval.

Prerequisites

Before you can use the AWS Health API operations for Organizational View, you must:

Satisfy the prerequisites in the AWS Health User Guide.
Enable AWS Health to work with AWS Organizations in US East (N. Virginia) Region. To do this, call the EnableHealthServiceAccessForOrganization operation from the management account or from an account that can assume the role with the required permissions.
Have permissions to create a service-linked role for Amazon Elasticsearch Service.

Deploying the stack

I used AWS CloudFormation to deploy the architecture. The EventBridge rule triggers the Lambda function every 15 minutes to ingest data to Amazon Elasticsearch Service. Depending on your requirements, you can omit or disable the rule later. Although the Lambda function is triggered every 15 minutes, it cannot ingest the data right after the deployment. Later in the post, I explain the Kibana setup.

The following diagram shows the Amazon Elasticsearch Service architecture that is deployed with the CloudFormation stack. This gives you a secure public Access Point backed by Amazon Cognito and NGINX.

Figure 2: Amazon Elasticsearch Service secure deployment architecture with a public proxy and private elastic network interfaces

This solution deploys public and private subnets across three Availability Zones, but keeps most of the internet traffic (private and public access) to a minimum. Access to the internet is limited from instances in the VPC by using a NAT gateway. The right side of the diagram refers to service VPCs. For security and service stability, most of the AWS services are deployed in a service VPC, which is managed by the service team, for additional layers of isolation.

You can deploy the stack in any AWS Region where the services EventBridge, Amazon Cognito, Amazon Elasticsearch Service, and Lambda are available.

To launch the CloudFormation stack, choose Launch Stack below.

1. On Specify stack details, enter a name for the stack (for example, Health-agg), and then choose Next.

The Specify stack details page provides a field to enter a name for the stack and a Parameters section.

Figure 3: Enter a stack name for the CloudFormation template to be deployed

2. In Stack creation options, for Rollback on failure, choose Disable. You want to preserve errors so you can debug any issues. Choose Next.

The Stack creation options section includes entries for Rollback on failure, Timeout, and Termination protection.

Figure 4: Disable rollback on failure under Stack creation options

3. Keep the other fields at their defaults, scroll to the bottom of the page, and in the Capabilities section, select the two check boxes. Choose Create stack.

Figure 5: Acknowledge IAM resource creation and capabilities

It takes approximately 30 to 35 minutes to create the stack.

4. On the Stacks page, choose the root stack (health-agg), copy the value for KibanaProxyURL, CognitoUser, and CognitoPassword from the Outputs section, and paste them in a text file for future use.

The Stacks page includes the selected Health-agg root stack with columns for Stack info, Events, Resources, Outputs, Parameters, Template, Change sets columns with Outputs column highlighted.

Figure 6: Deployed root stack with nested stacks

Set up Kibana

1. Copy the KibanaProxyURL from the text file and paste it in your web browser for accessing Kibana. You have a screen similar to the one shown here. I am using Firefox.

The browser displays Warning: Potential Security Risk Ahead with Go Back (Recommended) and Advanced buttons.

Figure 7: Navigate to Kibana dashboard using the web browser

Since you are using self-signed certificates, the authority that is responsible for public validation cannot be verified. That is ok, click Advanced button, and the following screen will appear. Click on Accept the Risk and Continue button.

Webpage says “Websites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for the IP address” with an Error code, View Certificate link,Go Back and Accept the Risk and Continue buttons.

Figure 7.1: Navigate to Kibana dashboard using the Advanced settings

2. On the sign-in page, use the credentials (CognitoUser, CognitoPassword) from the text file to sign-in.

The sign-in page has boxes for user name and password.

Figure 8: Sign in with Kibana credentials

Note: If you cannot reach the sign-in page or if you are getting 502 or 504 errors, the NGINX server (proxy) might have stopped. Open the AWS Systems Manager console and start a new session with the proxy instance. Run the following commands and then retry.

sh-4.2 # sudo su - root
[root@ip-10-1-0-xxx ~]# cd /etc/nginx/conf.d/
[root@ip-10-1-0-xxx conf.d]# service NGINX restart

3. Change your password.

The Change Password page provides fields to enter and re-enter the password and lists password requirements.

Figure 9: Choose a new password

4. On the Welcome to Elastic Kibana page, choose Explore on my own.

Welcome page provides a Try our sample data button and an Explore on my own button.

Figure 10: Welcome to Elastic Kibana

5. From the Kibana dashboard, choose Dev Tools.

Kibana dashboard provides Observability, Metrics, and Security sections.

Figure 11: Choose Dev Tools to create index mapping

6. A mapping must be defined to ensure that the aggregated Health events data maps properly on ingestion. Before you create the mapping, look at this massaged JSON obtained by combining the outputs of the following Organizational View APIs:

DescribeEventsForOrganization: Summary information about events across the organization.
DescribeAffectedAccountsForOrganization: List of accounts in your organization impacted by an event.
DescribeEventDetailsForOrganization: Detailed information about events in your organization.
DescribeAffectedEntitiesForOrganization: Information about AWS resources in your organization that are affected by events.

7. Paste this index mapping in the Dev Tools console, and then choose the green button in the upper right corner.

The Dev Tools console provides History, Settings, and Help tabs.

Figure 12: Create index mapping

You should receive a confirmation that says "acknowledged":true. This confirms the mapping.

Because fine-grained access is enabled on the cluster, the Lambda function that ingests data to the cluster needs access. Add the IAM role for the Lambda function to the backend roles in Kibana.

8. In the CloudFormation console, choose the Lambda stack, and from Outputs, copy the ESLambdaExecutionRoleARN value.

Figure 13: Copy the ESLambdaExecutionRole ARN from the CloudFormation console

9. Choose the Amazon Cognito stack, and copy the AuthRoleARN value.

Figure 14: Copy AuthRoleARN from CloudFormation console

10. Go back to Kibana console, and choose Dev Tools

Dev Tools console provides History, Settings, and Help tabs with a text area on the left side to run API calls and another text area on the right to view API call results.

Figure 15: Navigate back to Dev Tools

11. To update the backend roles, scroll down and execute the following API call. Ideally, the Amazon Cognito role (cog_auth_role) would not be needed here because it’s already been added, but the current Amazon Elasticsearch Service API calls do not have the option to only add a new backend role. For that reason, both roles are required here. For more information, see REST API differences in the Amazon Elasticsearch Service Developer Guide and Role mappings in the Open Distro for Elasticsearch documentation.

Figure 16: Update Kibana backend roles

Now data is ingested into Amazon Elasticsearch Service every time the Lambda function is triggered (every 15 minutes). The ingest Lambda function also handles Health event updates for long-running events based on lastUpdatedTime.

Note: If you don’t have the data within the past 15 minutes or want to change the frequency, open the EventBridge console, choose the rule (TriggerPHDLambda), and edit the schedule as required. You must update the Lambda code too, because EventBridge doesn’t pass the interval. Open the Lambda console, choose the getAggregateHealthEvents function, and edit the TRIGGERMINUTES environment variable. You can even set hours and days (must be converted to minutes). Also, if you choose to pull many events, make sure to increase the Lambda function timeout, because the AWS Health API for Organizational View allows only one transaction per second. The Lambda code already handles this back off behavior using a sleep interval.

If there are no AWS Health events for the specified interval, you may not see the data in Amazon Elasticsearch Service. The best way to verify the flow is by making sure the Amazon CloudWatch Logs for the Lambda functions (getAggregateHealthEvents & ingestToESFunction) are created.

This diagram shows how I’m calling the AWS Health API for Organizational View from the Lambda function.

Figure 17: Lambda calls to AWS Health API for Organizational View

12. Go back to the Kibana console and choose Discover.

Discover is displayed in the left navigation of the Kibana console.

Figure 18: Choose Discover to create an index pattern

The Create index pattern page is displayed.

Create index pattern page is displayed with Index pattern text box, matching system indices (event-phd), disabled Next step button, and pagination.

Figure 19: Create index pattern page

13. I am using the same event-phd index pattern. You can change it according to your mapping. Choose Next step.

On the Create index pattern page, event-phd appears in the Index pattern text box with a success message below saying “Success! Your index pattern matches 1 index.” and the Next step button enabled.

Figure 20: Provide index pattern name

14. You can set the time filter on one of the time fields. In Time Filter field name, I entered startTime. You can choose other time field names (for example, endTime or lastUpdatedTime). Choose Create index pattern.

In Time Filter field name drop down, startTime is selected.

Figure 21: Select startTime as time filter field name

After your index pattern has been created, you should see the created index with all the field’s.

The event-phd filter pattern is displayed along with every field in the index and the field’s associated core type as recorded by Amazon ES.

Figure 22: Index created successfully

15. Choose Discover to look at the ingested data.

Page says, “Expand your time range. One or more of the indices you’re looking at contains a date field. Your query may not match anything in the current time range, or there may not be any data at all in the currently selected time range.”

Figure 23: Choose Discover to view ingested data

16. From the calendar dropdown list, set the timeframe for looking at the ingested data. I chose 90 days from the calendar dropdown.

The calendar dropdown displays Quick select fields, with commonly used options such as today, this week, last 15 minutes, and recently used date ranges.

Figure 24: Select timeframe to view ingested data

You should now see your ingested data along with the histogram timeline view.

Figure 25: View ingested data with the histogram view

Visualize the data

Now it’s time to analyze this data and find insights by using visualization. I am going to create few visualizations.

To create a pie chart visualization:

On the Kibana home page, choose Visualize.
Choose the plus sign to create a new visualization.
Under Basic Charts, choose the pie chart visualization.
Select the source as event-phd (index)
In the upper right corner, change the time interval to reflect the interval for which you want to visualize the data.
Under Metrics, for Slice size aggregation use Unique count.
Under Field, use service.keyword.
Under Buckets, click Add and choose Split slices. For Aggregation, choose Terms. For Field, choose service.keyword. Leave Order by, Order, and Size at their defaults.
Select Group other values in separate bucket.

A window is displayed with Data (highlighted) and Options tabs showing Metrics and Buckets sections. Metrics section has Slice size field, Buckets section shows Split slices as sub section with Aggregation, Field, Order by, Order, Size, and Group other values in separate bucket fields with the selected values.

Figure 26: Pie chart visualization settings

For Label for other bucket, use Other.
Repeat steps 8 through 10, this time with Field set to awsAccountID.keyword instead of service.keyword.

In the top half of the window, fields Label for other bucket, show missing values, Custom label, and Advanced are shown. On the bottom half of the window, a Split slices section is shown with fields for Sub aggregation, Field, and Order by filled with the selected values.

Figure 27: Pie chart split slices by AccountID

On the Options tab, select Show labels, and then choose Update.

Figure 28: Pie chart view

In the upper left corner, choose Save and enter a title and description to this visualization. This visualization provides insights on respective service notifications based on the account. I am creating another visualization here to show the time line and count of the events.

To create a Timelion visualization:

On the Kibana home page, choose Visualize.
Choose the plus sign to create a visualization.
Under Basic Charts, choose the Timelion type visualization.
On the upper right corner, change the time interval to the interval for which you want to visualize the data.
From the dropdown list, choose an interval that works for you. I chose 1 day for the interval.
Under Timelion expression, enter the following query expression to get the data for the top 10 services, and then choose Update.

.es(index=event-phd,timefield=startTime,split=service.keyword:10).title(title="PHD events by start date and time")

The window shows the Interval and the Timelion expression to get the data for the top 10 services along with a Discard and Update button at the bottom.

Figure 29: Timelion visualization query and settings

The chart should like the following.

The Visualize page shows the Timelion chart with events by start date and time. Service names are listed on the top-left side of the chart. When you pause on the chart, the count and date of service names updates.

Figure 30: Timelion visualization

In the upper left corner, choose Save and enter a title and description for this visualization. You can create visualizations, share them across teams and business units, and use them for metrics and reporting purposes. You can also create a dashboard by adding all of the preceding visualizations.

I added the preceding visualizations and few more to a dashboard as shown here.

The dashboard view shows four visualizations (Pie chart, Timelion chart, Tag cloud, and Heat map) in a two-by-two grid.

Figure 31: Dashboard view with multiple visualizations.

You can create the dashboard shown in Figure 31 by downloading this file and following these steps:

From the left pane, choose Management.
Choose Saved Objects, and then choose Import.
Choose the downloaded file, clear the Automatically overwrite all saved objects check box, and then choose Import.
If you already have the same index (event-phd) or visualizations and dashboard names as mentioned in the imported file, a pop-up message will ask if you want to override the index, visualizations, and dashboards. Choose Overwrite or Cancel based on your preference.

When you choose the dashboard name, the dashboard is displayed along with all the visualizations in the imported file.

Note: If EventBridge, Lambda, Amazon Cognito, Amazon Elasticsearch Service, or any related services are down, this solution won’t work during the specified timeframe. You can increase the duration of the Lambda function polling to around 10-12 hours because most large-scale events (LSEs) do not last longer than that. This way, all the events can be captured even when the services are down. The ingest Lambda function has the capability to update a document if it already exists, hence data duplication doesn’t occur.

Cleanup

When you have finished visualizing data, delete the CloudFormation stacks to clean up all the AWS resources that you created. Make sure to empty the LambdaZipsBucket (created in the Lambda stack) and then delete the root CloudFormation stack (Health-agg), which, in turn, deletes all the child stacks. Also, delete the CloudWatch log groups manually for the Lambda functions as they won’t be deleted automatically when you delete the CloudFormation stacks.

Conclusion

In this post, I showed you how you can use AWS organizations to aggregate all AWS Health events centrally across your accounts. You deployed a serverless infrastructure through CloudFormation that periodically ingests data to Amazon Elasticsearch Service and then created visualizations. You can now view and query data in one place and generate and share operational insights for your entire organization across teams and business units.

You can create different visualizations and dashboards for different teams/business units based on the services that matter most to them, and you can use the Amazon Elasticsearch Service anomaly detection feature to detect and notify teams of anomalies such as large-scale events in progress or increased latencies/errors for a service. You can also summarize scheduled maintenances, operational issues, and upcoming changes and send a report to your teams weekly or biweekly, keeping them informed and operationally compliant.

If you’d like to send organizational AWS Health events to Amazon Chime or Slack, check out the Send Organizational AWS Health Events to Amazon Chime or Slack blog post, which leverages the AWS Health Organizational View Alerts (AHOVA) notification tool for sending alerts.

About the Author

Author's picture. Srinivasa Atta is a Sr. Technical Account Manager at Amazon Web Services (AWS). At AWS, Srini works with enterprise customers to design, deploy, and manage their cloud architectures and strategies. Srini has over 12 years of experience in information technology, including roles in software development, infrastructure, leadership, and architecture.

AWS Cloud Operations Blog