AWS Big Data Blog
Integrate Power BI with Amazon Redshift for insights and analytics
July 2022: This post was reviewed for accuracy.
Amazon Redshift is a fast, fully managed, cloud-native data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Microsoft Power BI is a business analytics service that delivers insights to enable fast, informed decisions. With Power BI, you can perform ad-hoc query analysis, visualize data, and create-user friendly dashboards.
This post demonstrates how to integrate Power BI with Amazon Redshift to deliver powerful visualization and insights. For a more integrated experience, AWS offers Amazon QuickSight – a fully managed BI service with secure private VPC connectivity, native ML-insights and pay-per-session pricing to deliver insights to everyone in the organization.
Solution architecture
This post provides code artifacts to help you create a big data environment on AWS from scratch. You can automatically provision a new Amazon Redshift data warehouse in under an hour without much technical depth required by using the AWS CloudFormation template and code examples provided.
The post also demonstrates how to configure integration for the most common deployment scenarios. For example, how do you connect Power BI to AWS services using ODBC/JDBC drivers? How do you connect to AWS services that are deployed behind a private network? What credentials do you use to connect to AWS services? This post addresses and answers these questions in the subsequent sections.
The following diagram shows the solution architecture deployed to AWS. All components inside the AWS Cloud boundary are deployed automatically using an AWS CloudFormation template to allow you to reproduce this solution quickly using your AWS account.
When deployed, the solution contains the following components:
- Networking infrastructure that includes VPC, public and private subnets, security groups, internet gateway, NAT Gateway, and route tables
- Linux EC2 instance provisioned in a public subnet to generate sample data
- Windows Server EC2 instance to host Power BI Desktop
- Windows Server EC2 instance to act as an on-premises data gateway that handles the communication between Power BI and Amazon Redshift
- An Amazon Redshift cluster deployed in a private subnet
- IAM user and roles with permissions to access Amazon S3 and Amazon Redshift
Prerequisites
To complete the steps in this post, you need the following prerequisites:
- AWS account – You need an account to follow the instructions and test it with minimal cost.
- If you are creating your account for the first time, choose the
us-east-1
region. - Create a key pair for the selected Region. For more information, see Amazon EC2 Key Pairs.
- If you are creating your account for the first time, choose the
- Power BI tenant – You can test all the described Power BI functionalities with minimum to no cost with the following:
- Power BI Pro license
- Access to your Power BI admin portal
Creating and configuring your development environment
Before you can create Power BI visualizations in AWS, you need to load a fully working development environment with sample data. This section contains instructions to create and configure that environment from scratch. After completing all the deployment steps in this section, you have an AWS infrastructure with all the integration hooks between AWS and Power BI fully configured.
To create this environment, execute the following high-level tasks:
- Run an AWS CloudFormation template to provision the initial development environment.
- SSH into an Amazon EC2 Linux instance to generate a sample dataset.
- Configure the data warehouse by creating and loading data into Amazon Redshift tables.
- Install and configure the Power BI Desktop.
- Configure a data gateway in Power BI.
- Install the Power BI mobile app so you can consume the visuals from your phone.
To make the deployment quick and easy, this post automates much of the deployment steps through the use of an AWS CloudFormation template. For the tasks that could not be automated, the post provides detailed instructions along with actual code examples. You can find the template and relevant code in the GitHub repo. Remember to clone this GitHub repo to a local working folder because you need to reference these artifacts as you walk through the deployment steps in this section.
Provisioning a development environment
In this step, you run an AWS CloudFormation template to provision the initial infrastructure. Complete the following steps:
- On the AWS CloudFormation console, under Region, choose US East (N. Virginia).You can choose other Regions, but you need to change the EC2 AMI IDs in the AWS CloudFormation template parameters to match the chosen Region.
- Choose Create stack.
- Under Specify template, select Upload a template file.
- Click on Choose file.
- Select ProvisionDevEnv.yaml file from your working folder.
- Choose Next.
- Under Specify stack details, provide the following information:
- Stack name – Enter a name for your CloudFormation stack.
- AmazonLinuxImageID – Use the default value if you are running in US East (N. Virginia). Otherwise, provide the Amazon Linux AMI ID for your chosen Region. An easy way to find the AMI ID is to go to the Amazon EC2 console and launch a new instance. The Amazon Linux AMI ID should be listed on the first page. Use Amazon Linux, not Amazon Linux 2.
- EC2KeyPair – Enter the name of your key pair for your Region.
- MyIPAddress – Enter your public IP address in CIDR format, for example,
205.251.192.0/32
. To find your public IP address, search online forwhat is my ip address
. - RedshiftMasterUserPassword – Enter a password for your Amazon Redshift cluster.
- RedshiftMasterUsername – Use the default username or provide your own.
- WindowsServerImageID – Use the default value if you are running in US East (N. Virginia). Otherwise, provide the Microsoft Windows Server 2019 Base AMI ID for your Region.
- For all other options, keep the default values.
- Choose Next.
- On the Configure stack options page, choose Next.
- On the Review page, select the “I acknowledge…” check box.
- Choose Create stack.
Wait for the CloudFormation stack creation to complete, which can take approximately 10 minutes. When the CloudFormation stack is complete, choose Stack Outputs. In this section, you can find all the keys and values of the resources that you need to reference later in the deployment process.
Generating an sample dataset
Now that you have provisioned the initial infrastructure, let us generate the sample data to be use in the Power BI visualizations. For more information about how to generate the sample dataset, see GenerateSampleDataset.txt in your working folder.
Replace the placeholders in this file with the following corresponding values from the AWS CloudFormation Stack Outputs section:
- [KeyPairFileName] – Replace with the value of the
EC2KeyPair
template input parameter - [EC2IPAddress] – Replace with the public IP address of this EC2 instance with the key
DataGeneratorPublicIP
- [S3BucketName] – Replace with the actual S3 bucket name
After you generate the dataset and copy it to your S3 bucket, terminate the EC2 instance with the name Data Generator
. You don’t need it anymore.
Configuring an Amazon Redshift cluster
In this step, you run a series of SQL commands to create tables in your Amazon Redshift cluster. After that, you load the data you generated earlier from Amazon S3 into the Amazon Redshift tables. Complete the following steps:
- On the Amazon Redshift console, choose Query Editor.
- Connect to the cluster by providing the database name, database user, and password that were provided as input parameters to the CloudFormation stack.
- From the schema drop-down menu, choose public.
- Open the file CreateRedshiftTables.txt in your working folder and run each SQL statement, one at a time, in the Query Editor.
- Run the
COPY
commands in the file CopyFromS3ToRedshift.txt, one at a time, to load sample data into the tables you just created. Replace the following placeholders with the actual values from the AWS CloudFormation Stack Outputs section- [S3BucketName] – Replace with actual S3 bucket name.
- [RedshiftAccessRoleName] – Replace with the IAM role name.
- Confirm that you see the following tables listed under the public schema. Do a select
count(*)
on each table to see how many rows you have for the following tables:customer
: 1, 500,000lineitem
: 59,986,052nation
: 25orders
: 15,000,000part
: 2,000,000partsupp
: 8,000,000region
: 5supplier
: 100,000
Installing and configuring Power BI Desktop
In this step, you connect using Remote Desktop Protocol (RDP) into the Windows Server jump box and install Power BI Desktop, which you use later to create visualizations from the data that you pulled from Amazon Redshift. For more information, see Connecting to Your Windows Instance. Complete the following steps:
- From your local machine, RDP to the Windows Server Jump Box. You can get the public IP address of the jump box from the CloudFormation Stack Outputs tab or from your EC2 Console.
- Remember to use the EC2KeyPair you specified in the CloudFormation template to decrypt the administrator password by using the EC2 console:
- Open the Amazon EC2 console, and then choose Instances.
- Select the instance of your choice, choose Actions, and then choose Get Windows Password.
Note: It can take a few minutes for this option to be available after first launching a new instance. - Click on Choose File, select your key pair file, and then choose Open. Alternatively, you can paste the contents of your key pair into the text box.
- Choose Decrypt Password.
- The first thing you’ll want to do after you’ve logged into the jump box is to turn off IE Enhanced Security Configuration. Otherwise, you will have problems logging into Power BI Service later.
- Inside the Jump Box, go to Start > Server Manager > Local Server
- Click the On link to the right of IE Enhanced Security Configuration
- Click Off under the Administrators section and Click OK
- Download and install Power BI Desktop 64-bit on the Jump Box. Remember to just download PBIDesktopSetup_x64.exe, which is the 64-bit version of Power BI Desktop. After you download the executable, run it to install Power BI Desktop. Accept all the default settings.
Installing and configuring a Microsoft on-premises data gateway
In this step, you install a Microsoft on-premises data gateway to enable Power BI to communicate with data stores that are not accessible from the public internet, such as your Amazon Redshift cluster, which you deployed inside a private subnet. Install the Microsoft on-premises data gateway on the data gateway instance. You can perform the following tests at minimum or no cost. Complete the following steps:
- From the Windows Server Jump Box, you will RDP to the data gateway using its private IP address, which you can get from the EC2 Console.
- Use the EC2KeyPair to decrypt the administrator password for the data gateway following the same instructions as outlined in the previous steps.
- Turn off IE Enhanced Security Configuration.
- Inside the data gateway, go to Start > Server Manager > Local Server
- Click the On link to the right of IE Enhanced Security Configuration
- Click Off under the Administrators section
- Click OK
- Download and install Microsoft On-premises Data Gateway
- On the screen that asks you to choose the type of gateway you need, choose On-premises data gateway (recommended)
- Accept the default values and click Install
- When the installer ask you to sign-in, type the email address associated with the admin account for the Power BI Pro tenant.
- Click Sign in
- After sign-in, you will be asked to register the gateway. If you are asked to register a new gateway or migrate, restore, or take over an existing gateway, choose Register a new gateway.
- Give your gateway a name and provide a recovery key
- Click Configure
- You should see a green checkmark indicating the gateway is online and ready to be used.
Configuring the data gateway in Power BI
In this step, you log in to your Power BI tenant as an administrator to configure how Power BI connects to the AWS data sources. Complete the following steps:
- On your browser (from any machine), launch https://powerbi.com.
- Log in with an account that has admin privileges. Use the same account that you for the data gateway.
- Choose the gear icon.
- Choose Manage gateways.
- Choose the gateway you installed earlier.
- Select the check box Allow user’s custom data connectors to refresh through this gateway cluster.
- Choose Apply.
- Choose ADD DATA SOURCE.
- Create a data source for Amazon Redshift with the following parameters:
- Data Source Name – Amazon Redshift Private VPC.
- Data Source Type – Amazon Redshift.
- Server – Replace with the value of the [
RedshiftClusterEndpoint
] key from the AWS CloudFormation Stack Outputs - Database – Enter the name of the Amazon Redshift database, which you provided earlier as part of the CloudFormation stack parameters. The default is
dev
. - Username – Enter your database username, which you provided earlier as part of the CloudFormation stack parameters.
- Password – Enter your database password, which you provided earlier as part of the CloudFormation stack parameters.
- Choose Add.
Installing the Power BI mobile app
In this step, you install the Power BI app on your mobile phone so you can interact with Power BI reports and dashboards later from your phone. Complete the following steps:
- Go to either the Apple App Store or Google Play Store and search for
Microsoft Power BI
. - Install the app.
- Sign in to Power BI with the same account you’ve been using.
Visualizing Redshift Data using Power BI Desktop
In this section, you will connect Power BI Desktop to Redshift; create a report; publish the report to Power BI service; and finally consume the report from your phone.
- From Jump Box, launch Power BI Desktop
- If you haven’t already, sign in to the Service with your Power BI credentials
- Select Home > Get Data > More > Database > Amazon Redshift
- Click Connect
- On the next screen, provide the following values:
- Server – copy the value of the key [RedshiftClusterEndpoint], which is found in the CloudFormation Stack Outputs tab
- Database – dev (or whatever name you gave for the database)
- Data Connectivity Mode – DirectQuery
- If this is the first time you’re connecting to this cluster, then you’ll need to type the Redshift credentials you provided to the CloudFormation Stack earlier. Type in your Redshift username and password in the popup window and click on Connect.
- Select the orders, lineitem, and part tables from the Navigator window and then click Load.
- Once the data has finished loading, you will need to define table relationships in the in-memory model.
- In Power BI Desktop, change to the Model view by clicking on the “table relationship” icon on the left.
- Create relationships between the tables by dragging and dropping the following columns on each other.
- o_orderkey = lineitem.l_orderkey
- p_partkey = lineitem.l_partkey
- Now, you are ready to create some charts. Change to Report view and add the following visualizations to the report.
- Date Slicer
- Visualization type – Slicer
- Field – orders.o_orderdate
- Sales by Date by Manufacturer
- Visualization type – Line Chart
- Axis – orders.o_orderdate
- Legend – part.p_mfgr
- Values – lineitem.l_extendedprice
- Order Count
- Visualization type – Card
- Fields – Count of orders.o_orderkey
- Line Item Count
- Visualization type – Card
- Fields – Count of lineitem.l_linenumber
- Date Slicer
- With a little formatting, your report shows Sales by Date and Manufacturer and should look something similar to this:
- Save the report and give it a name.
Publishing the report to Power BI
After you create a report in Power BI Desktop, you need to publish the dataset and report to Power BI to share it with others or consume it from a mobile device. Complete the following steps:
- In Power BI Desktop, choose Publish
- Choose My workspace.
- Choose Select.
It should not take long for the publishing to complete because no data was imported.
Configuring the gateway connection
You now need to tell Power BI to map the data source inside this report to the Amazon Redshift data source that you registered with the data gateway. Complete the following steps:
- From any machine, open a browser and launch https://powerbi.com.
- Log in to Power BI.
- Choose My workspace.
- Choose Datasets.
- Choose the … icon next to dataset you just published.
- Choose Settings.
- Expand the Gateway connection section.
- For Use a data gateway, choose On.
- Under Maps to, choose the Amazon Redshift data source you created earlier.
- Choose Apply.
Consuming the Power BI reports and dashboard
Now that you have published the report to Power BI successfully, you can interact with it on https://powerbi.com.
- Launch https://powerbi.com.
- Log in to Power BI.
- Choose My workspace.
- Choose Reports.
- Choose the report you just published.
Your report should look similar to the following screenshot. It shows the same Sales by Date and Manufacturer graph as before.
You can interact with the reports and dashboards on the Power BI mobile app the same way you can on the Power BI website. The following screenshot shows how the Sales by Date and Manufacturer report looks on the Power BI app for iPhone.
When you view and interact with the report, whether from the website or your mobile app, you’re always viewing the latest data because you connect directly to Amazon Redshift.
Conclusion
This post showed how to use Power BI to query data in Amazon Redshift to generate reports, visualization, and dashboards, and described a solution architecture that you can deploy in the AWS Cloud. The post also demonstrated how easy it is to set up and connect Power BI to an Amazon Redshift cluster in a VPC without public internet access, and how to push Power BI dashboards to mobile devices. Be sure to also check out Amazon QuickSight, AWS’ cloud-native BI service, which requires no desktop clients or servers, automatically detects Redshift instances in your VPC and connects securely to your Redshift instances via private VPC connectivity.
Special acknowledgement goes to AWS colleagues Juan Yu, Sophia Jung, and Joe Harris for their valuable comments and suggestions.
If you have any questions or suggestions, leave your feedback in the comment section. If you need any further assistance to optimize your Amazon Redshift implementation, contact your AWS account team or a trusted AWS partner.
About the authors
Vu Le is a Senior Data Architect, Strategic Accounts Team, AWS Professional Services.
Po Hong, PhD, is a Senior Data Architect, Data & Analytics Global Specialty Practice, AWS Professional Services.