AWS Partner Network (APN) Blog
Data Warehousing and Business Intelligence for VMware Cloud on AWS
By David Piet, Ph.D., VMware Cloud Global Account Specialist SA Lead – AWS
For several years, VMware Cloud on AWS has been delivering VMware’s software defined data center (SDDC) running on top of Amazon Elastic Compute Cloud (Amazon EC2) as a managed service. Jointly engineered by VMware and Amazon Web Services (AWS), VMware Cloud on AWS brings vSphere, NSX, VSAN, and more into AWS.
Amazon Redshift is the fastest and most widely used cloud data warehouse. Native integration with the AWS analytics ecosystem makes it easier to handle end-to-end analytics workflows with friction. For example, Amazon QuickSight is the first business intelligence (BI) service with pay-per-session pricing you can use to create reports, visualization, and dashboards on Amazon Redshift data.
It isn’t uncommon for production workloads, or even non-production workloads, to have a database component to them. As time goes on, databases grow due to more and more data getting ingested. Hidden in that data is a world of information.
How do we elevate our workloads to get more out of the data we already have? Typically, this requires some kind of data warehousing so your databases can maintain their performance. In turn, this requires some method to replicate your data into the data warehouse for manipulation. Perhaps the trickiest component to this is the analysis, creation of reports, visualizations, and dashboards.
Running Amazon QuickSight on top of data residing in Amazon Redshift delivers just that.
With Amazon Redshift, you get fast, simple, and cost-effective means to analyze all of your data using standard SQL and your existing BI tools. With Amazon QuickSight, you have an easy-to-use, cloud-powered business analytics service that makes it easy to build out visualizations, perform ad-hoc analysis, and quickly get business insights from data, and in this case, data sitting in Amazon Redshift.
For customers that have databases running inside of VMware Cloud on AWS, this pattern is no exception. To replicate the data from VMware Cloud on AWS into Amazon Redshift, AWS Database Migration Service (AWS DMS) offers simple-to-use tooling to move data quickly and securely, all while your database remains fully operational, minimizing downtime.
This post describes how this solution can help you get more out of existing data residing inside your databases running in VMware Cloud on AWS. We’ll also walkthrough some ideas on how to get started.
Solution Overview
On the left side of the architectural diagram in Figure 1, we have our databases running inside of our VMware Cloud on AWS SDDC. Our goal is to ultimately run analytics on the data we are amassing and build out dashboards of our analyses.
To meet that end, we use AWS DMS to do the replication from VMware Cloud into Amazon Redshift. Once the data arrives in Amazon Redshift, we’ll use QuickSight as our BI tooling. All the while, our databases and application in VMware Cloud on AWS continue running uninterrupted.
Figure 1 – Data warehousing and business intelligence integrated with VMware Cloud on AWS.
Workflow Walkthrough
With the high-level overview laid out above, let’s go ahead and dive a bit deeper, beginning with the VMware Cloud on AWS. VMware Cloud on AWS is a managed service with the resources run in a single-tenant. It’s a VMware-managed AWS account and a VMware-managed virtual private cloud (VPC).
With VMware Cloud on AWS, customers get access to AWS services in their own AWS accounts via an Elastic Network Interface (ENI) that gets deployed into a VPC of their choosing at the time of SDDC creation.
In the example, we have our customer database virtual machine (VM) running in our SDDC, as shown in Figure 2. This VM is hosting a PostgreSQL database where we have been collecting data on our marketing campaign and been doing A/B testing to determine which is generating more clicks.
As we continue to collect data, we need to start replicating what we have into Amazon Redshift for further analysis. To do that, we use AWS DMS and the key to making this efficient is using the ENI in our AWS account that’s linked to our SDDC. This ENI gives us high throughput and low latency connectivity between our VMware Cloud on AWS SDDC and native AWS services, all without our traffic leaving the AWS global infrastructure.
To allow connectivity between our VM and AWS DMS, there are two primary steps to take—opening the firewall rules in NSX-T on the VMware Cloud on AWS side, and opening the security groups for the ENI and the AWS DMS replication instance on the VPC side.
Figure 2 – Customer database VM in our SDDC.
Starting with the NSX-T side, we need to create two firewall rules in our compute gateway allowing traffic to flow between our database VM and the VPC interface. One is for inbound traffic, and the other is for outbound traffic. Figure 3 shows the rules created for our example.
Figure 3 – NSX-T firewall rules allowing traffic to and from the connected VPC.
On the AWS side, we need to ensure our replication traffic can flow between the SDDC ENI and the replication instance that AWS DMS is using. To do that, the security group for the replication instance must allow ingress on the database port from the security group associated to the SDDC ENI.
With firewall rules and security groups configured, the last thing we need to do is create a database endpoint within AWS DMS for our database VM, as shown in Figure 4.
To verify we have correctly created our AWS DMS endpoint and that we’re allowing the right traffic between our SDDC and AWS DMS, test the connection to our new SDDC endpoint under the Connections tab. This tells us if we can successfully connect to our database VM or not.
Figure 4 – AWS DMS endpoint for our database VM.
Since our source database resides in a VPC, it’s possible to have this entire architecture also reside in our VPC. For security reasons, and to isolate our traffic, that’s path we have chosen. That way, we can keep all of the traffic isolated to our single VPC.
The Amazon Redshift cluster we are using as a target for AWS DMS is created in the same VPC, with enhanced VPC routing enabled and public accessibility disabled. In a similar fashion, Amazon QuickSight is also connected in to our VPC.
The remaining piece we need to keep our workflow inside of our VPC is to add a VPC endpoint for Amazon Simple Storage Service (Amazon S3). This is because when AWS DMS replicates data from a source database to a target database, it stages the data in an S3 bucket before publishing it to the target.
In this case, since we’ve made our Amazon Redshift cluster only accessible from within our VPC, we need this VPC endpoint to keep the S3 traffic limited to our VPC.
Once these pieces are in place, we can kick off the replication. Of course, the time it takes for this complete is contingent on how much data you are copying over. However, once it completes, the analytics can begin.
In this example, we sent out two sets of emails to our customers to A/B test which marketing campaign will drive a higher click rate. With this data now sitting in Amazon Redshift, we can use QuickSight to build a dashboard showing the data.
In this case, we have it laid out into a map of the United States. To zoom in for specific results by state, simply click on that state (California in this example) and see the data. With your data, the possibilities are limitless.
Figure 5 – Amazon QuickSight dashboard showing A/B test data across the U.S. (top) and California (bottom).
Conclusion
One of the biggest advantages of VMware Cloud on AWS is that it can readily integrate with other AWS services. That gives you countless ways to elevate your workloads.
If you’re amassing data in your databases over time and are looking for novel ways to glean fresh insights out of it, using Amazon Redshift and Amazon QuickSight is an easy and accessible way to achieve it.
To learn more about VMware Cloud on AWS, such as reference architectures, video overviews, solution briefs, check out these resources.
If you’re looking for more info or want to connect with us to implement something like this in your environment, please reach out to us to get started.