Automating database migration and refreshing activities with AWS DMS
Application developers and system administrators replicate data across datastores to migrate, refresh, and mask data. For most organizations, data replication is a complex, multiphase process, including assessment, schema conversion, script conversion, data migration, functional testing, performance tuning, and other tasks. Multiple tools are available to support data replication.
AWS Database Migration Service (DMS) helps you quickly and securely migrate databases to AWS. The source database remains fully operational during the migration, minimizing application downtime. DMS can migrate your data to and from most widely used commercial and open-source databases.
By supporting both homogenous and heterogeneous migrations, DMS can continuously replicate your data with high availability between your datastores. Tasks defined in DMS can perform a full load, cached changes, or ongoing replication between the datastores.
In this post, we offer a global automation solution for migrating, refreshing, and masking data. This solution involves deploying an AWS CloudFormation stack integrating DMS, AWS Lambda, and Amazon CloudWatch resources. A DMS replication task migrates or refreshes the data from an on-premises datastore to the AWS Cloud datastore.
For this demo, we use an Amazon RDS for Oracle DB instance as the on-premises database and an Amazon Aurora MySQL instance as the cloud database. With the DMS event subscriptions feature enabled, the DMS replication task notifies an Amazon SNS topic on all the task events. The SNS topic forwards the notification to a subscribed Lambda function. This Lambda function enables or disables a CloudWatch Events rule, which in turn triggers the DMS replication task.
To simplify networking considerations, we configure all the supported services for this solution in the same VPC. We also provide AWS CloudFormation templates to help you provision and deploy resources used for this solution. These templates are specially designed to demonstrate this solution. Use extra caution and make necessary modifications before using these directly on production workloads.
In this post, we explain how to do the following:
- Provision your baseline networking and database resources. If you already have an existing DMS replication task, you can skip this step and the following one, instead proceeding to deploy the automated solution (step three).
- Validate the provisioned resources.
- Deploy the resources to test the solution with the provided AWS CloudFormation template.
- Clean up after the migration.
- Explore ongoing synchronization and data masking functionalities.
The following diagram outlines the proposed solution architecture.
To follow the steps outlined in this post, you need an AWS account.
Also, you need the API/CLI-specific dms-vpc-role role to launch a DMS task programmatically. In the IAM console, choose Roles and confirm that this role is present. If this role isn’t present, see the instructions in the next section.
1. Provisioning the DMS resources
In this optional step, we show you how to create the following base networking and database resources:
- A VPC and two subnets
- Security groups for Amazon RDS and DMS replication instance
- An RDS for Oracle database from the snapshot
- An Aurora MySQL database
- A DMS replication instance
- DMS endpoints and a replication task
Provision your baseline networking and database resources as follows. For your convenience, we have predefined most details in the template. Edit as needed.
- Download the DMS baseline resources template to set up your resources. To launch these resources directly through the console, choose Launch Stack, Next.
- On the Specify stack details page, after you specify the template details, choose Next.
- On the Configure Stack Options page, for Tags, enter any desired values and choose Next.
- On the Review page, select the check box for I acknowledge that AWS CloudFormation might create IAM resources with custom names and choose Create stack.
Provisioning takes approximately 20–25 minutes to complete. When the stack status shows Create Complete, choose Outputs, and review the results.
2. Validating the DMS resources
Next, validate your resources.
- In the DMS console, under Resource Management, choose Endpoints.
- Select the DMS endpoint dms-blog-cloud-aurmysql-endpoint and choose Test Connections under Actions. The Status field should show successful.
- Repeat the same process for the other DMS endpoint, dms-blog-onpremise-oracle-endpoint. Make sure that Status shows successful.
- Under Conversion & migration, choose Database migration tasks.
- Select the task ora2aurmysql-copy and expand the details. The following screenshot shows the provisioned DMS migration task configured with the source and target endpoints to enable the migration or refresh.
3. Deploying automated resources for the solution
In this section, we show you how to deploy the following resources to automate the database migration or refresh:
- The CloudWatch Events rule
- The Lambda function that triggers the DMS task
- An optional Lambda function to perform post data processing and associated Lambda activities
- A DMS event subscription
- An SNS topic
The following table shows the valid combinations of parameters used in the template to automate both data refresh and migration activities.
Provision the resources to automate the solution as follows. We have predefined the values to the first option from the table above to perform data-refresh using the DMS task from the first step. Edit as needed.
- Download the resources template to set up your resources.
- To launch your stack directly through the console, choose Launch Stack,
- On the Specify stack details page, edit the predefined parameters as needed and choose Next.
- On the Configure Stack Options page, for Tags, enter any optional values and choose Next.
- On the Review page, select the check box for I acknowledge that AWS CloudFormation might create IAM resources with custom names and choose Create stack. Provisioning should complete in approximately five to ten minutes.
- When the CloudFormation Stacks console shows Create Complete, choose Outputs to review the results.
The TaskStarterLambda Lambda function dms-task-starter-lambda performs the following tasks:
- Start the data migration or refresh by triggering the provisioned or existing DMS task.
- Avoid duplicate invocation by disabling the CloudWatch Events rule.
To run the migration / refresh activity, follow the steps:
- In the CloudWatch console, choose Rules, and select dmstask-scheduler-eventrule. This rule schedules the Lambda function, which in turn triggers the DMS replication task, on a cron schedule of 00:00 GMT hours every day, as chosen at the time of provisioning resources.
- To invoke the DMS task manually, in the Lambda console, select the Lambda function dms-task-starter-lambda.
- Choose Test. On the configure test event screen, choose Create new test event. For Event Template, choose Hello World. For Event name, enter demo, and choose Create.
- Choose Test again to trigger the Lambda function with the configured test event. This Lambda function disables the CloudWatch Events rule and triggers the DMS task. The following screenshot shows a sample of log output.
- To confirm the DMS task status, in the DMS console, choose Database migration tasks, and select the task ora2aurmysql-copy. The Status field should show starting or running.
If you chose to enter an existing DMS task earlier, then your task should display Status as starting or running.
The DMS task copies a small table from the RDS for Oracle DB instance to the Aurora MySQL database. Find the copy Load state in the Table statistics section, as shown in the following screenshot.
With the DMS event subscriptions feature enabled, the DMS replication task triggers an SNS topic on all the task events. DMS uses SNS to provide notifications when a DMS event occurs, such as the creation or deletion of a replication instance. Configure SNS notifications for any AWS Region as emails, text messages, or calls to an HTTP endpoint.
In the DMS console, choose Events to see the sequence of events generated.
Choose Event subscriptions to see the subscription for the SNS topic to the DMS events.
The SNS topic is configured to invoke the Lambda function dataprocess-eventenable-lambda for every DMS replication task event. This function performs the following tasks:
- Performs optional post-processing cloud datastore activities.
- Re-enables CloudWatch Events rule.
Navigate back to the DMS console and check the status of the task. Once the task Status shows Load Complete, follow the below steps to view the logs:
- In the Lambda console, select the Lambda function dataprocess-eventenable-lambda.
- Choose Monitoring, View logs in CloudWatch.
- Select the most recent Log Streams value from the list, which should open the logs for that stream. The following screenshot shows sample output from the Lambda function for different DMS replication task events.
- As you can see from the log screenshot, the Lambda function responds to the replication task event
Replication task has stopped. Stop Reason FULL_LOAD_ONLY_FINISHED. Use this Lambda function to perform any post-processing activities such as sanitizing the data or masking any confidential information before used by the application.
It is important to note that these Lambda functions performing any post-migration or refresh processing activities should complete before the function timeout—that is, within 15 minutes. To learn more, see AWS Lambda Limits.
4. Cleaning up after the migration
After completing and testing this solution, clean up the resources by stopping your tasks and deleting the AWS CloudFormation stacks. For more information, see Deleting a Stack on the AWS CloudFormation Console.
5. Additional functionalities and use cases for this solution
Data refresh across environments
Application development teams deal with multiple environments, including development, quality assurance (QA), staging, and production. Development teams test and verify product quality in a non-production environment before deploying to production environments. These teams need an environment with production-like data to test new features and bug fixes.
Use the outlined solution to configure continuous data refresh from production to non-production environments. Set the post-processing Lambda function to perform actions like data masking, custom data generation, and Personally Identifiable Information (PII) cleanup before you enable the CloudWatch Events rule for subsequent runs.
Synchronizing data table subsets across environments
Often cloud migration starts with migrating one service at a time in a complex application environment. Over time, however, performance demands require full duplication of on-premises workloads in cloud storage.
One way you can prepare for such a scenario is to start migrating read-only workloads to the cloud. Point writes to on-premises storage until you’re ready for a full cutover of that database and application. In that case, one-way cloud database synchronization can support your migrated applications while also maintaining on-premises database integrity. Use the solution outlined in this post for one-way synchronization through scheduled replication of table subsets from an on-premises database to a cloud database.
Masking data with multi-access strategy
Application teams sometimes provide data access while masking the source database configuration details and shielding that source from changes. To automate a multi-access strategy that appropriately masks data, use the DMS replication solution outlined in this post.
Set up DMS replication without CDC mode. To read ongoing changes from the source database, DMS uses engine-specific API actions to read changes from the source engine’s transaction logs. Each source engine requires specific configurations to expose this change stream to a given user account. Synchronize databases on your preferred schedule without risking the integrity of the original data.
In this post, we outlined a solution using DMS to automate migration, one-time refreshes, ongoing synchronization, and data masking. we hope that you found this post helpful.
As always, AWS welcomes feedback. Please submit comments, questions, or feature requests below.
About the Authors
Udayasimha Theepireddy (Uday) has worked as a Sr. Cloud Database Architect at Amazon Web Services since November 2017. He works with internal Amazon customers to move several services from on-prem Oracle to Aurora, RDS PostgreSQL, RDS MySQL and Redshift databases.
Satya Vajrapu is a DevOps Consultant with Amazon Web Services. He works with AWS customers to help design and code modules for infrastructure, application and processes.