AWS Public Sector Blog
Near real-time dashboards from a source database to a cloud data warehouse on AWS
Organizations require solutions for real time or near real time dashboards that can be provided to their customers without impacting their database performance or service level agreements (SLAs) to their end users. In this post, we showcase the integration of Amazon Web Services (AWS) capabilities to present an end-to-end architecture for this data flow.
Solution overview
Consider a fictional county that has offices that capture visits. The county wants to be able to present the current wait times or the number of people waiting in line, by office. The county wants a dashboard that can be embedded into their website. In addition, they want to extract business intelligence (BI) from the streaming data in a way that a data warehouse can provide. Finally, the county doesn’t want to impact the local database while the dashboards are refreshed.
The following diagram shows the solution architecture.

Figure 1. Solution architecture of the solution described in this post. The major components are AWS Database Migration Service, Amazon Kinesis Data Streams, Amazon Redshift, and Amazon QuickSight.
AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups. AWS DMS also helps you replicate ongoing changes to keep sources and targets in sync.
Change data capture (CDC) refers to the process of identifying and capturing changes made to data in a database and then delivering those changes in real time to a downstream system. Capturing every change from transactions in a source database and moving them to the target in real time keeps the systems synchronized, and helps with real-time analytics use cases and zero-downtime database migrations.
Amazon Kinesis Data Streams is a fully managed streaming data service. You can continuously add various types of data such as clickstreams, application logs, and social media to a Kinesis stream from hundreds of thousands of sources. Within seconds, the data will be available for your Kinesis applications to read and process from the stream.
Amazon Redshift is a fully managed cloud data warehouse that’s used by tens of thousands of customers for price-performance, scale, and advanced data analytics. Amazon Redshift enables data warehousing by seamlessly integrating with other data stores and services in the modern data organization through features such as zero-ETL, data sharing, streaming ingestion, data lake integration, and Redshift ML.
Amazon QuickSight is a fully managed, cloud-centered BI service that makes it straightforward to connect to your data, create interactive dashboards, and share these with tens of thousands of users, either within the QuickSight interface or embedded in software as a service (SaaS) applications or web portals.
The solution follows six high-level steps:
- Configure the source database
- Configure the Kinesis Data Streams application
- Configure AWS DMS for CDC
- Configure Amazon Redshift Serverless
- Configure Amazon QuickSight
The final dashboard, updated in near real time, looks like the following image.
Prerequisites
To perform the solution, you need to have the following prerequisites:
- Access to an AWS account with permissions to the relevant services.
- Familiarity with the AWS console.
Configure the source database
To configure the source database, follow these steps:
- Create a database using Amazon Relational Database (Amazon RDS) for MySQL. This will represent the on-premises database of the county.
- Select a db.t4g.micro RDS instance based on MySQL.
- Make sure to choose the IAM database authentication for improved security.
- Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance t3.medium based on Windows AMI.
- Connect to the instance using fleet manager, refer to Connect to your Windows instance using Fleet Manager.
- Update the security groups as follows:
- The Amazon EC2 security group should allow outbound traffic type MYSQL/Aurora TCP with port = 3306 to the destination = security group of the RDS (no inbound rules)
- The security group of the Amazon RDS instance should allow inbound of the type MYSQL/Aurora TCP with port = 3306 from the security group of the EC2 instance.
- Within the EC2 instance, download the MySQL workbench at https://dev.mysql.com/downloads/workbench/.
- Install MySQL workbench and connect to the Amazon RDS MySQL database, refer to Use IAM authentication to connect with SQL Workbench/J to Amazon Aurora MySQL or Amazon RDS for MySQL
- To allow for CDC, on the Amazon RDS console, select Create parameter group.
- From the RDS left menu, select Parameter groups and open the forcdc parameter group and edit the parameters, as shown in the following screenshot.
a. Set Binlog_format to ROW.
b. Set binlog_row_image to FULL. - Choose the radio button next to the instance name. Choose Modify and change the DB parameter group to the group you created, as shown in the following screenshot.
- Reboot the instance. For more information, refer to Rebooting a DB instance.
- Run the following queries and validate inside MySql Workbench.
a. select @@binlog_row_image as binlog_row_image;
b. select @@binlog_format as binlog_format; - In MySQL Workbench, select the “new schema” in the top menu and create a new schema. Name it
anycounty
, as shown in the following screenshot: - Add data to the table as follows:
Configure the Amazon Kinesis Data Streams application
To configure the Kinesis data stream, follow these steps:
- Create a new Kinesis data stream, for more information refer to Create a stream using the AWS Management Console.
- On the Create data stream page, under Data stream capacity, choose On-demand. Name it
streamforrds
.
Configure AWS DMS for CDC
To configure AWS DMS for CDC, follow these steps:
- On the AWS DMS console, create a subnet group refer Creating a subnet group for an AWS DMS migration project.
- Add the relevant subnets in the default virtual private cloud (VPC).
- Select the instance class as
dms.t3.micro
. - For High Availability, select the single AZ option.
- Select the VPC and the subnet group created in step 2.
- Wait until the instance is created.
- Create a new secret based on RDS, refer to Password management with Amazon RDS and AWS Secrets Manager.
- Provide the User name = admin and the Password from the Amazon RDS secret created in a previous step.
- Create the DMS endpoints, refer to Creating source and target endpoints.
a. Create the source endpoint to the RDS MySQL database. Use the server name from the Amazon RDS service. - Make sure there is an AWS Identity and Access Management (IAM) role that allows access to AWS Secrets Manager. In the trust policy, make sure to use
dms.us-east-1.amazonaws.com
, as shown in the following screenshot. - Make sure the security group of the DMS instance allows all inbound traffic from the same security group.
- (Optional) Test the source endpoint connection by selecting the source endpoint, select the Connections tab and choose Test connections.
- Create the target endpoint.
- Create a new AWS DMS replication task based on CDC. Select the anycounty schema and the office_visits Run the task.
- Validate that the data is flowing to the Kinesis data stream.
Configure Amazon Redshift Serverless
To configure Amazon Redshift Serverless, follow these steps:
- Create an IAM role for Amazon Redshift Serverless, refer to Create a role to delegate permissions to an AWS service.
Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadStream",
"Effect": "Allow",
"Action": [
"kinesis:DescribeStreamSummary",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards",
"kinesis:DescribeStream"
],
"Resource": "arn:aws:kinesis:*:<account>:stream/*"
},
{
"Sid": "ListStream",
"Effect": "Allow",
"Action": "kinesis:ListStreams",
"Resource": "*"
}
]
}
Trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"events.amazonaws.com",
"redshift.amazonaws.com"
]
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"AWS:SourceAccount": "1234567890"
},
"ArnEquals": {
"AWS:SourceArn": "arn:aws:dms:us-east-1: 1234567890:rep:6LO4G7A4SNG4PC732QEDP772OY"
}
}
}
]
}
- Create a new Amazon Redshift serverless instance using the default settings, refer to Get started with Amazon Redshift Serverless data warehouses.
- Make sure that
a. You have a VPC endpoint to Amazon Redshift.
b. In the Amazon Redshift serverless workgroup, enable private connectivity.
c. Create a security group that has inbound 52.23.63.224/27 (the us-east-1 QuickSight IP range) with port 5439. The range is taken from Supported AWS Regions for Amazon QuickSight. - Make sure to create the default Amazon Redshift IAM role. You need to select Create IAM role and later to associate the IAM role previously created.
- Create a new serverless Amazon Redshift instance and configure the stream as a view with auto refresh:
- Open Amazon Redshift query editor v2
a. Choose >. - Connect using a Federated user.
- Consume streaming data from Kinesis Data Streams using a materialized view. Enter the following command inside the editor:
CREATE EXTERNAL SCHEMA kds
FROM KINESIS
IAM_ROLE '<role arn>'; - Enter the following command:
CREATE MATERIALIZED VIEW office_view AUTO REFRESH YES AS
SELECT *
FROM kds.streamforrds;Validate that the view office_view can be accessed and does not contain any row, by running the following query in the query editor:
select count(*) from office_view
- Generate new data in the
office_view
table. Wait for few minutes and run the query again.
Configure Amazon QuickSight
To configure Amazon QuickSight, follow these steps:
- Sign in to Amazon QuickSight. For more information, refer to Signing in to Amazon QuickSight.
- On the QuickSight console, choose Manage and Manage VPC connection. You need to add a VPC endpoint to Amazon Redshift.
- Configure VPC connection and add the necessary permissions to the
aws-quicksight-service-role-v0 role
IAM role, refer to Configuring the VPC connection in the QuickSight console. - To get the admin password of the serverless instance, navigate to Redshift open the namespace of the serverless instance and select Edit admin credentials.
- To create a new dataset, on the QuickSight console, choose Dataset and then Redshift (manual connect).
- Under Database server, copy the endpoint from the Amazon Redshift workgroup configuration. Remove the port and database name and copy in the designated fields. For example, if the string is domain:5439/dev, then for Database server, enter domain. For Port, enter 5439. For Database name, enter dev.
- Define a new dataset in QuickSight based on the following query.
select
nullif(json_extract_path_text(from_varbyte(kinesis_data,'utf-8'),'data',true), '')::character(max) as ID
from office_view
- Add a calculated field named
office-name
.parseJson(id, "$.office_name")
- Create an analysis based on this dataset.
- Under Visuals, choose Vertical bar chart. Assign the field to the X-axis.
- Publish the dashboard.
As the dashboard is refreshed, the data will be near real time, based on the refresh rate of the view.
Conclusion
With Amazon Kinesis Data Streams as an AWS DMS target, together with a managed, serverless cloud data warehouse and a cloud business analytics service, you now have a powerful way to stream change data from a database directly into your data warehouse without impacting your source database while providing near real-time dashboards of your business operations. You can use this method to stream change data from any sources supported by AWS DMS to perform real-time data processing. Happy streaming!
If you have any questions or suggestions, please email oeshel@amazon.com.