AWS Storage Blog
Monitoring CloudEndure Disaster Recovery with AWS Lambda
Many organizations must monitor and track their disaster recovery (DR) initiatives to meet business and audit objectives. CloudEndure Disaster Recovery provides fast and reliable recovery of physical, virtual, and cloud-based servers into AWS. It uses email alerts and has a high-level dashboard for monitoring DR jobs. However, many organizations require more robust reporting mechanisms. This includes graphical dashboards and up-to-the-minute status reports to satisfy audits and meet business service level agreements (SLAs).
CloudEndure’s API and integration with security information and event management (SIEM) tools, such as DataDog, provide additional customization. Presidio, an AWS Premier Partner, harnesses technology innovation and simplifies IT complexity to digitally transform businesses and drive return on IT investment. Presidio created a CloudEndure monitoring dashboard using AWS Lambda and DataDog. It presents a single pane of glass to monitor CloudEndure Disaster Recovery workloads and the AWS environment.
This post explains how to use the CloudEndure Disaster Recovery API with AWS Lambda functions to pull logs from CloudEndure and translate them into DataDog. The AWS Lambda functions, created by Presidio, use the CloudEndure API to authenticate and run GET commands on active CloudEndure Disaster Recovery jobs. The AWS Lambda functions are triggered on a regular interval determined by a CloudWatch Events schedule. The combination of DataDog and CloudEndure APIs provides a detailed report on the status of protected servers and email alerts. It also provides a single view into your CloudEndure Disaster Recovery workloads.
How the CloudEndure API works with AWS Lambda functions
This section explains how CloudEndure’s API token provides the code snippets that will run as AWS Lambda functions. This article assumes you have set up CloudEndure Disaster Recovery and have begun replication. You must also have a Datadog API setup and configured.
The following is a visual flow of how CloudEndure logs are posted to DataDog while using AWS Lambda functions for the automation components.
CloudEndure API
The following steps show you how to generate a CloudEndure API token from the CloudEndure console in the Other Settings tab under Setup & Info.
You can use the API Token to sign in (rather than your username and password) when making API calls. Click on GENERATE NEW TOKEN to generate a new API Token.
AWS Lambda functions
Log in to the AWS Management Console and search for the Lambda service. Select create function on the Lambda console. The following is an overview of the Lambda functions.
The handler.py Lambda function is triggered by a CloudWatch Events cron schedule. The handler is the main module of the project, and calls the function that gathers and sends logs to DataDog. It posts new data logs to DataDog based on the cron schedule, which is set up using Amazon CloudWatch Events.
All of the CloudEndure Disaster Recovery job status info is parsed by the ce_logs.py Lambda function for new events. It uses the CloudEndure API token for authentication with your CloudEndure project and retrieves the CloudEndure event logs. These logs are returned back to the handler.py function to be filtered and then sent to the DataDog API.
The post_data_to_datadog.py Lambda function retrieves the encrypted DataDog API key from the SSM Parameter Store. It then posts data to DataDog using the key for authentication. After sending the data to DataDog, the Lambda automation’s job is complete.
The next two Lambda functions’ (send_latest_dr_events.py and send_latest_repl_events.py) main role is to send only the latest update of events from CloudEndure. The logic ensures that no duplicate data is sent to DataDog.
The preceding Lambda functions are triggered on a regular interval determined by the CloudWatch events schedule. The idea is to retrieve new CloudEndure logs, filter for new events, and then post the CloudEndure logs to DataDog in a specific format.
handler.py
ce_logs.py
post_data_to_datadog.py
send_latest_repl_events.py
send_latest_dr_events.py
How the dashboard works in DataDog
This section includes screenshots of how the solution runs. First, launch a DR server in the CloudEndure console. After launching, you can see that the task fails. Log in to the CloudEndure console to see the job progress failure.
The DataDog dashboard shows the parsed failure message pulled from the CloudEndure console. It is using the handler.py Lambda function to pull this information.
If you properly set up your DataDog automation alerts, you can extend notifications to email alerts. For more info on this, work with your DataDog administrator or contact Presidio to assist you with this feature.
The CloudEndure Test Migration Dashboard outputs real-time monitoring of all of the servers protected by CloudEndure Disaster Recovery. All of this information is gathered using the Lambda functions post_data_to_datadog.py, send_latest_repl_events.py, and send_latest_dr_events on a regular CloudWatch event schedule. The dashboard can provide useful information like job completion, replication completion, timestamps of jobs completed, lag on any servers, creation of resources such as subnets, and detailed reports of server status.
Presidio can create fully customizable dashboards for any data analytics platform. While this blog shows examples of dashboards within DataDog, the same concepts can be applied to any data analytics platform.
Conclusion
In this blog post, we demonstrated using AWS Lambda to capture, parse, and send CloudEndure Disaster Recovery events logs to DataDog for a single pane-of-glass dashboard view. This information is crucial for monitoring business SLAs and can be used to satisfy audit checks.
If there are other use cases or functions you may be interested in, please reach out to our AWS Partner Presidio who would be happy to help you create them.
Thanks for reading this blog on monitoring CloudEndure Disaster Recovery with AWS Lambda. If you have any comments or questions, please leave them in the comments section.