AWS Database Blog

Use the AWS Database Migration Service to Stream Change Data to Amazon Kinesis Data Streams

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more.

In this post, we discuss how you can use AWS Database Migration Service (AWS DMS) to stream change data into Amazon Kinesis Data Streams.

An earlier post, Load CDC Data, discussed real-time data processing architecture. As part of that, it covered how to capture changes in an Amazon RDS for Microsoft SQL Server database using AWS DMS and send them to Kinesis Data Streams using AWS Lambda. With the launch of Kinesis Data Streams as a target for AWS DMS, we make it easier for you to stream, analyze, and store change data capture (CDC) data. DMS uses best practices to automatically collect changes from a data store and stream them to Data Streams.

With the addition of Kinesis Data Streams as a target, we are helping customers build data lakes and perform real-time processing on change data from their data stores. You can use AWS DMS in your data integration pipelines to replicate data in near real time directly into Kinesis Data Streams. Using this approach, you can build a decoupled and eventually consistent view of your database without having to build applications on top of a database, which is expensive.

Real-time change data architecture

Many organizations are using Kinesis Data Streams to analyze change data to monitor their websites, fraud detection, advertising, mobile applications, IOT and many more. In this post, we describe how you can use AWS DMS to load change data from a relational database to Kinesis Data Streams. We also describe how you can evolve your data platform architecture to Kappa Architecture as seen in the diagram following.

In the preceding figure, AWS DMS supports several sources for Kinesis Data Streams as a target. You can load your real-time change from RDBMS or NoSQL to Amazon Kinesis Data Streams using AWS DMS and further transform and process it using Apache Spark on Amazon EMR. This data can then be consumed by various applications like Amazon QuickSight, Amazon Athena, Amazon CloudSearch, AWS Lambda, KCL applications, or Amazon Kinesis Data Analytics. The data can be stored in databases like Amazon Redshift or Amazon DynamoDB for running batch processing reports.

You can also use Amazon Kinesis Data Firehose to capture the data streams and load into Amazon S3 buckets for further analytics. For example, you can process all customer call center emails and phone calls to address customer complaints more effectively. Customer care employees can access customer records from different financial departments such as credit cards, loans, and accounts to provide a faster and better customer experience with high availability.

Here are some other examples of how customers use Amazon Kinesis Services:

  1. Netflix uses Amazon Kinesis to monitor the communications between all of its applications so it can detect and fix issues quickly, ensuring high service uptime and availability to its customers.
  2. Zillow uses Amazon Kinesis to collect public record data and MLS listings. Zillow then updates home value estimates in near real-time so home buyers and sellers can get the most up-to-date home value estimates.
  3. Sonos uses Amazon Kinesis to monitor 1 billion events per week from wireless hi-fi audio devices, and delivers better listening experience to its customers.
  4. A multinational bank was also able to query their utilization metrics to detect fraud within minutes. Loan and credit card specialists at the bank could reduce their loan and credit card processing time. They could also provide quicker turnaround time because they pulled up credit scores faster from their systems.

How to set up Kinesis Data Streams as a target in AWS DMS

It is simple to set up Data Streams as a change data target in AWS DMS and start streaming data. To get started, you first create an IAM role with minimal access.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "kinesis:PutRecord",
 "kinesis:PutRecords",
 "kinesis:DescribeStream"
 ],
 "Resource": <streamArn>
 }
 ]
}
Trust Relationships
{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Principal": {
 "Service": "dms.amazonaws.com"
 },
 "Action": "sts:AssumeRole"
 }
 ]
}

After you define your IAM role, you then set up your source endpoint, target endpoint, and replication instance in AWS DMS. Your source is the database that you want to move data from and the target is the database that you’re moving data to. In our case, the source database is an Oracle database on Amazon RDS and the target database is the Kinesis data stream. The replication instance processes the migration tasks and requires access to the source and target endpoints inside your VPC.

Create a Kinesis stream by logging in to your AWS Management Console, choosing Kinesis, and then choosing Create Kinesis stream.

Create the source and target endpoints by going to the AWS Database Migration Service console.

Set up the source endpoint as shown following.

Set up the target endpoint as shown following.

Set up the Amazon Kinesis Firehose delivery stream to load data to Amazon S3.

Create a task to start the migration between the source and the Kinesis endpoint.

Here is an example JSON file for object mapping:

{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "DMS_SAMPLE",
                "table-name": "PLAYER"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "2",
            "rule-action": "map-record-to-record",
            "target-table-name": "PLAYER",
            "object-locator": {
                "schema-name": "DMS_SAMPLE",
                "table-name": "PLAYER"
            },
            "mapping-parameters": {
                "partition-key-type": "schema-table"
            }
        }
    ]
}

After the task starts running and loading data to your Kinesis data stream, check the Amazon S3 bucket for the data.

Here is sample data from the Oracle Source database:

ID SPORT_TEAM_ID LAST_NAME      FIRST_NAME                     FULL_NAME

—————————————————————–

6951       81 Koyie                           Hill                            Koyie Hill

 

SQL>

 

Sample data from the Kinesis data stream:

{
"data": {
"ID": 6951,
"SPORT_TEAM_ID": 111,
"LAST_NAME": "Koyie",
"FIRST_NAME": " Hill",
"FULL_NAME": "Koyie Hill"
},
"metadata": {
"timestamp": "2018-11-06T18:09:06.363956Z",
"record-type": "data",
"operation": "load",
"partition-key-type": "schema-table",
"schema-name": "DMS_SAMPLE",
"table-name": "PLAYER"
}
}

You can also retrieve data directly from the Kinesis data stream using the following command.

SHARD_ITERATOR=$(aws kinesis get-shard-iterator —shard-id shardId-000000000000 —shard-iterator-type TRIM_HORIZON —stream-name kramya-test-integ —query 'ShardIterator')

aws kinesis get-records —shard-iterator $SHARD_ITERATOR

Conclusion

With Amazon Kinesis Data Streams as an AWS DMS target, you now have a powerful way to stream change data from a relational database directly into Amazon Kinesis Data Streams. You can use this method to stream change data from any sources supported by AWS DMS to perform real-time data processing. Happy streaming!

If you have any questions or suggestions, please leave a comment below.

 


About the authors

Udayasimha Theepireddy is a database cloud architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance about database migrations and big data projects.

 

 



Ramya Kaushik is a database engineer with the Database Migration Service (DMS) & Schema Conversion Tool (SCT) team at Amazon Web Services.