AWS Database Blog

Implement an automated approach for handling AWS DMS operational events

AWS Database Migration Service (AWS DMS) allows you to tackle the complex task of migrating both homogenous and heterogeneous database engines. As businesses evolve, the need to adopt the most suitable database engines for their unique requirements arises. This often leads to the coexistence of various database systems, presenting challenges when it comes to seamless data migration. However, with AWS DMS, customers have found an efficient and reliable solution to bridge the gap between different database engines.

AWS DMS enables seamless migration and replication of databases, but sometimes errors may occur during the process. When these errors are recoverable, it’s essential to know how to efficiently resume and complete the task to ensure data integrity and minimize downtime. In this post, we explore the process of resuming recoverable error tasks for the relational database target tasks.

Overview

AWS DMS replication tasks encompass a variety of events, including start, pause, finish, Full Load completion, CDC (Change Data Capture) initiation, error and more. These events play a crucial role in monitoring and managing database migration tasks. These events are uniquely identified by event IDs, each representing a specific occurrence during the migration process. Among these event IDs, 0078 and 0079 hold particular significance as they are utilized for resuming tasks seamlessly.

DMS-EVENT-0078 represents that the replication task has failed.

DMS-EVENT-0079 indicates that the replication task has stopped.

You can find additional information about the AWS DMS generated event categories and event messages in the AWS DMS Developer Guide.

When DMS generates events, it sends them to Amazon EventBridge default bus. In this solution, we will create EventBridge rule on default bus which would match incoming recoverable events and sends them to targets for processing. We use AWS Lambda to process these events and restart failed or stopped tasks.

With this architecture, the solution will automatically resume recoverable tasks.

Prerequisites

For this walkthrough, you must complete the following prerequisites:

  • You must have an active AWS account, which will serve as the foundation for all your cloud-based activities.
  • Have a Virtual Private Cloud (VPC) configured, as AWS DMS operates within this isolated network environment to ensure secure data transfers.
  • Basic understanding of DMS concepts and functionalities will greatly enhance your experience with the service.
  • Basic understanding of AWS Lambda, a service that lets you run code in response to events and run the necessary stop or resume task commands without needing to manage servers

Deployment

In this section, we walk you through how to deploy this solution. To launch the provided CloudFormation template, complete the following steps:

  1. Sign in to the console on the central account.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For Stack name, enter a name. For example, DMSResumableTaskStack.
  5. Provide Security group ID and Subnet ID for AWS Lambda function which must provide connectivity to DMS API.
  6. Choose Next.
  7. Enter any tags you want to assign to the stack and choose Next.
  8. Select the acknowledgement check boxes and choose Submit.
    The stack takes approximately 5 minutes to complete. Wait until the stack is complete before proceeding to the testing and verification.

Note: As AWS Lambda execution is in play, associated costs come into the picture, encompassing elements like request volume, execution duration, and the memory allocation to your Lambda functions. When the lambda restarts the task, it puts tag with timestamp with when the task is auto restarted. If the task fail again within 5 mins, lambda will not start the task. You must troubleshoot the task to understand the cause of the failure. Consider to create a case with support for further assistance.

Additional Rules

EventBridge receives an event, an indicator of a change in AWS DMS environment, and applies a rule to route the event to a notification mechanism. Rules match events to notification mechanisms based on the structure of the event, called an event pattern. You can go to EventBridge and add additional eventID to automatically restart the tasks. To update EventBridge rule.

  1. Go to Amazon EventBridge and select DMS rule you created in previous section.
  2. Edit event rule, and add additional eventID based on your requirement and select next.
  3. Review newly added rule and select update rule.

Note: Update DMS events by incorporating DMS operational events into EventBridge rules, enabling the automatic restart of DMS tasks. It is essential to exercise caution when selecting these events, ensuring that only those that do not necessitate intervention are included in the EventBridge rule.

Testing and Verifying

  1. Go to DMS console, select migration task under Database migration tasks and select Stop.

  1. Wait for couple of minutes, this should trigger EventBridge rule to invoke lambda which would resume the task.
  2. For troubleshooting, Go to AWS Lambda console, select ‘StackName-<Hash>’ Lambda function and select view Amazon CloudWatchLogs and look for any error.

Cleaning up

To avoid incurring future charges, go to AWS CloudFormation Console, select stack and delete the stack.

Conclusion

In this post, we used AWS Lambda and Amazon EventBridge to automate the process of resuming recoverable error tasks in AWS DMS. By setting up EventBridge rules to capture specific event IDs, we can invoke Lambda functions that analyze and handle recoverable errors, thus eliminating the need for manual intervention and ensuring a smoother migration experience.

If you have questions or feedback, leave a comment.


About the authors

Felix David is a Sr. Technical Account Manager at AWS. He works with AWS customers to help understand their business and technical needs, align technical solutions, and achieve the greatest value from AWS.

Harish Bannai is a Sr. Technical Account Manager at AWS. He works with enterprise customers providing technical assistance on RDS, Database Migration services operational performance and sharing database best practices.

Kanwar Bajwa is an Enterprise Support Lead at AWS who works with customers to optimize their use of AWS services and achieve their business objectives.