Near real-time dashboards from a source database to a cloud data warehouse on AWS

Organizations require solutions for real time or near real time dashboards that can be provided to their customers without impacting their database performance or service level agreements (SLAs) to their end users. In this post, we showcase the integration of Amazon Web Services (AWS) capabilities to present an end-to-end architecture for this data flow.

Solution overview

Consider a fictional county that has offices that capture visits. The county wants to be able to present the current wait times or the number of people waiting in line, by office. The county wants a dashboard that can be embedded into their website. In addition, they want to extract business intelligence (BI) from the streaming data in a way that a data warehouse can provide. Finally, the county doesn’t want to impact the local database while the dashboards are refreshed.

The following diagram shows the solution architecture.

Figure 1. Solution architecture of the solution described in this post. The major components are AWS Database Migration Service, Amazon Kinesis Data Streams, Amazon Redshift, and Amazon QuickSight.

AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups. AWS DMS also helps you replicate ongoing changes to keep sources and targets in sync.

Change data capture (CDC) refers to the process of identifying and capturing changes made to data in a database and then delivering those changes in real time to a downstream system. Capturing every change from transactions in a source database and moving them to the target in real time keeps the systems synchronized, and helps with real-time analytics use cases and zero-downtime database migrations.

Amazon Kinesis Data Streams is a fully managed streaming data service. You can continuously add various types of data such as clickstreams, application logs, and social media to a Kinesis stream from hundreds of thousands of sources. Within seconds, the data will be available for your Kinesis applications to read and process from the stream.

Amazon Redshift is a fully managed cloud data warehouse that’s used by tens of thousands of customers for price-performance, scale, and advanced data analytics. Amazon Redshift enables data warehousing by seamlessly integrating with other data stores and services in the modern data organization through features such as zero-ETL, data sharing, streaming ingestion, data lake integration, and Redshift ML.

Amazon QuickSight is a fully managed, cloud-centered BI service that makes it straightforward to connect to your data, create interactive dashboards, and share these with tens of thousands of users, either within the QuickSight interface or embedded in software as a service (SaaS) applications or web portals.

The solution follows six high-level steps:

Configure the source database
Configure the Kinesis Data Streams application
Configure AWS DMS for CDC
Configure Amazon Redshift Serverless
Configure Amazon QuickSight

The final dashboard, updated in near real time, looks like the following image.

Figure 2. Amazon QuickSight dashboard for office wait times.

Prerequisites

To perform the solution, you need to have the following prerequisites:

Access to an AWS account with permissions to the relevant services.
Familiarity with the AWS console.

Configure the source database

To configure the source database, follow these steps:

Create a database using Amazon Relational Database (Amazon RDS) for MySQL. This will represent the on-premises database of the county.
Select a db.t4g.micro RDS instance based on MySQL.

Figure 3. Amazon RDS Create database screen with options for database creation method (Easy create selected) and configuration- of MySQL selected.
Make sure to choose the IAM database authentication for improved security.

Figure 4. Three choices for database authentication, with “Password and IAM database authentication” selected.
Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance t3.medium based on Windows AMI.

Figure 5. Amazon EC2 screen for launching an instance, with Windows AMI selected.
Connect to the instance using fleet manager, refer to Connect to your Windows instance using Fleet Manager.
Update the security groups as follows:
1. The Amazon EC2 security group should allow outbound traffic type MYSQL/Aurora TCP with port = 3306 to the destination = security group of the RDS (no inbound rules)
  
  Figure 6. Outbound rule set.
2. The security group of the Amazon RDS instance should allow inbound of the type MYSQL/Aurora TCP with port = 3306 from the security group of the EC2 instance.
  
  Figure 7. Inbound rule set.
Within the EC2 instance, download the MySQL workbench at https://dev.mysql.com/downloads/workbench/.
Install MySQL workbench and connect to the Amazon RDS MySQL database, refer to Use IAM authentication to connect with SQL Workbench/J to Amazon Aurora MySQL or Amazon RDS for MySQL
To allow for CDC, on the Amazon RDS console, select Create parameter group.

Figure 8. Parameter group details screen includes group name, description, engine type, parameter group family, and type fields and dropdown menus.
From the RDS left menu, select Parameter groups and open the forcdc parameter group and edit the parameters, as shown in the following screenshot.
a. Set Binlog_format to ROW.
b. Set binlog_row_image to FULL.

Figure 9. Editing parameters for binlog_format and binlog_row_image.
Choose the radio button next to the instance name. Choose Modify and change the DB parameter group to the group you created, as shown in the following screenshot.

Figure 10. Changing the DB parameter group.
Reboot the instance. For more information, refer to Rebooting a DB instance.
Run the following queries and validate inside MySql Workbench.
a. select @@binlog_row_image as binlog_row_image;
b. select @@binlog_format as binlog_format;

Figure 11. MySQL Workbench with the queries run.
In MySQL Workbench, select the “new schema” in the top menu and create a new schema. Name it anycounty, as shown in the following screenshot:

Figure 12. Naming new schema in MySQL Workbench.
Add data to the table as follows:

Figure 13. Adding data to the table in MySQL Workbench.

Configure the Amazon Kinesis Data Streams application

To configure the Kinesis data stream, follow these steps:

Create a new Kinesis data stream, for more information refer to Create a stream using the AWS Management Console.
On the Create data stream page, under Data stream capacity, choose On-demand. Name it streamforrds.

Figure 14. Creating a data stream in Kinesis Data Streams.

Configure AWS DMS for CDC

To configure AWS DMS for CDC, follow these steps:

On the AWS DMS console, create a subnet group refer Creating a subnet group for an AWS DMS migration project.
Add the relevant subnets in the default virtual private cloud (VPC).
Select the instance class as dms.t3.micro.
For High Availability, select the single AZ option.
Select the VPC and the subnet group created in step 2.
Wait until the instance is created.

Figure 15. Creating an instance for CDC.
Create a new secret based on RDS, refer to Password management with Amazon RDS and AWS Secrets Manager.

Figure 16. Creating a secret for Amazon RDS.
Provide the User name = admin and the Password from the Amazon RDS secret created in a previous step.
Create the DMS endpoints, refer to Creating source and target endpoints.
a. Create the source endpoint to the RDS MySQL database. Use the server name from the Amazon RDS service.

Figure 17. Creating DMS endpoint to the RDS MySQL database.
Make sure there is an AWS Identity and Access Management (IAM) role that allows access to AWS Secrets Manager. In the trust policy, make sure to use dms.us-east-1.amazonaws.com, as shown in the following screenshot.

Figure 18. Trust policy for IAM role.
Make sure the security group of the DMS instance allows all inbound traffic from the same security group.
(Optional) Test the source endpoint connection by selecting the source endpoint, select the Connections tab and choose Test connections.
Create the target endpoint.

Figure 19. Creating target endpoint.
Create a new AWS DMS replication task based on CDC. Select the anycounty schema and the office_visits Run the task.
Validate that the data is flowing to the Kinesis data stream.

Figure 20. Data viewer tab in Amazon Kinesis showing that data is flowing to the stream.

Configure Amazon Redshift Serverless

To configure Amazon Redshift Serverless, follow these steps:

Create an IAM role for Amazon Redshift Serverless, refer to Create a role to delegate permissions to an AWS service.

Figure 21. Creating IAM role for Amazon Redshift Serverless.

Policy:

{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Sid": "ReadStream",

            "Effect": "Allow",

            "Action": [

                "kinesis:DescribeStreamSummary",

                "kinesis:GetShardIterator",

                "kinesis:GetRecords",

                "kinesis:ListShards",

                "kinesis:DescribeStream"

            ],

            "Resource": "arn:aws:kinesis:*:<account>:stream/*"

        },

        {

            "Sid": "ListStream",

            "Effect": "Allow",

            "Action": "kinesis:ListStreams",

            "Resource": "*"

        }

    ]

}

Trust policy:

{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Effect": "Allow",

            "Principal": {

                "Service": [

                    "events.amazonaws.com",

                    "redshift.amazonaws.com"

                ]

            },

            "Action": "sts:AssumeRole",

            "Condition": {

                "StringEquals": {

                    "AWS:SourceAccount": "1234567890"

                },

                "ArnEquals": {

"AWS:SourceArn": "arn:aws:dms:us-east-1: 1234567890:rep:6LO4G7A4SNG4PC732QEDP772OY"

                }

            }

        }

    ]

}
Create a new Amazon Redshift serverless instance using the default settings, refer to Get started with Amazon Redshift Serverless data warehouses.
Make sure that
a. You have a VPC endpoint to Amazon Redshift.
b. In the Amazon Redshift serverless workgroup, enable private connectivity.
c. Create a security group that has inbound 52.23.63.224/27 (the us-east-1 QuickSight IP range) with port 5439. The range is taken from Supported AWS Regions for Amazon QuickSight.
Make sure to create the default Amazon Redshift IAM role. You need to select Create IAM role and later to associate the IAM role previously created.

Figure 22. Creating and associating default Amazon Redshift IAM role.
Create a new serverless Amazon Redshift instance and configure the stream as a view with auto refresh:
Open Amazon Redshift query editor v2
a. Choose >.

Figure 23. Choosing > to view the Amazon Redshift Serverless stream.
Connect using a Federated user.

Figure 24. Choosing “Federated user” under the “Other ways to connect” to the default workgroup.
Consume streaming data from Kinesis Data Streams using a materialized view. Enter the following command inside the editor:
CREATE EXTERNAL SCHEMA kdsFROM KINESISIAM_ROLE '<role arn>';

Figure 25. Entering the command to create external schema in Redshift query editor.
Enter the following command:
CREATE MATERIALIZED VIEW office_view AUTO REFRESH YES ASSELECT *FROM kds.streamforrds;

Figure 26. Entering command for materialized view in Redshift query editor.

Validate that the view office_view can be accessed and does not contain any row, by running the following query in the query editor:

select count(*) from office_view

Figure 27. Verifying created view by running query.
Generate new data in the office_view table. Wait for few minutes and run the query again.

Figure 28. Running query to generate data in the view.

Configure Amazon QuickSight

To configure Amazon QuickSight, follow these steps:

Sign in to Amazon QuickSight. For more information, refer to Signing in to Amazon QuickSight.
On the QuickSight console, choose Manage and Manage VPC connection. You need to add a VPC endpoint to Amazon Redshift.
Configure VPC connection and add the necessary permissions to the aws-quicksight-service-role-v0 role IAM role, refer to Configuring the VPC connection in the QuickSight console.
To get the admin password of the serverless instance, navigate to Redshift open the namespace of the serverless instance and select Edit admin credentials.

Figure 29. Selecting “Edit admin credentials” action in Amazon Redshift namespace to get admin password.
To create a new dataset, on the QuickSight console, choose Dataset and then Redshift (manual connect).
Under Database server, copy the endpoint from the Amazon Redshift workgroup configuration. Remove the port and database name and copy in the designated fields. For example, if the string is domain:5439/dev, then for Database server, enter domain. For Port, enter 5439. For Database name, enter dev.

Figure 30. Editing the Amazon Redshift data source configuration.
Define a new dataset in QuickSight based on the following query.

Figure 31. Selecting “New dataset” button on top right of Amazon QuickSight Datasets page.

select

nullif(json_extract_path_text(from_varbyte(kinesis_data,'utf-8'),'data',true), '')::character(max) as ID

from office_view

Figure 32. Entering query to define new dataset.
Add a calculated field named office-name.parseJson(id, "$.office_name")

Figure 33. Adding field for office names in Amazon QuickSight.
Create an analysis based on this dataset.
Under Visuals, choose Vertical bar chart. Assign the field to the X-axis.

Figure 34. Vertical bar chart for office wait times.
Publish the dashboard.

As the dashboard is refreshed, the data will be near real time, based on the refresh rate of the view.

Conclusion

With Amazon Kinesis Data Streams as an AWS DMS target, together with a managed, serverless cloud data warehouse and a cloud business analytics service, you now have a powerful way to stream change data from a database directly into your data warehouse without impacting your source database while providing near real-time dashboards of your business operations. You can use this method to stream change data from any sources supported by AWS DMS to perform real-time data processing. Happy streaming!

If you have any questions or suggestions, please email oeshel@amazon.com.

AWS Public Sector Blog

Near real-time dashboards from a source database to a cloud data warehouse on AWS

Solution overview

Prerequisites

Configure the source database

Configure the Amazon Kinesis Data Streams application

Configure AWS DMS for CDC

Configure Amazon Redshift Serverless

Configure Amazon QuickSight

Conclusion

Resources

Follow

Learn

Resources

Developers

Help