AWS Big Data Blog

Creating dashboards quickly on Microsoft Power BI using Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in a data lake using standard SQL. One of the key elements of Athena is that you only pay for the queries you run. This is an attractive feature because there is no hardware to set up, manage, or maintain.

You can query Athena with SQL or by using data visualization tools such as Amazon QuickSight, Microsoft Power BI, Tableau, or other third-party options. QuickSight is a cloud-native business intelligence (BI) service that you can use to visually analyze data and share interactive dashboards with all users in your organization. QuickSight is fully managed and serverless, requires no client downloads for dashboard creation, and has a pay-per-session pricing model that allows you to pay for dashboard consumption with a maximum charge of $5.00 per reader per month. The combination of QuickSight and Athena allows you to rapidly deploy dashboards and BI to tens of thousands of users, while only paying for actual usage, and not worrying about server deployment or management.

Microsoft Power BI allows you similarly to analyze your data. Previously, creating dashboards with Microsoft Power BI and Athena required you to download all data locally on your computer. This takes time and can fail due to memory or network bandwidth constraints.

You can now create Microsoft Power BI dashboards and leverage the power of Athena through our out-of-the-box connector for Power BI. The connector is more scalable as it supports Power BI’s DirectQuery mode in which complete, raw data sets are not downloaded to your workstation. While you create or interact with a visualization, Microsoft Power BI works with Athena to dynamically query the underlying data source so you’re always viewing current data.

This post provides step-by-step guidance on how to use the Athena connector for Power BI to query, visualize, and share data with Power BI.

Solution overview

To create Microsoft Power BI dashboards using Athena as a data source, you start by designing a dashboard in Microsoft Power BI Desktop with the help of the Athena data source connector for Power BI and the Athena ODBC driver. When you finish creating your dashboard, you publish it to the Microsoft Power BI Service. To see your data on Microsoft Power BI Service, you need to install the Microsoft Power BI on-premises data gateway in your AWS account—it works like a bridge between Microsoft Power BI Service and Athena. Finally, you configure Athena as a new data source in Microsoft Power BI Service.

To authenticate yourself with Athena, you use an instance profile role because it is easier to do all the configuration, or you can use any of the different authentication options that the Athena ODBC driver provides.

The following diagram illustrates the solution architecture.

Walkthrough overview

For this post, we step through a use case using the data from the New York City Taxi Records dataset from 2015. The data is already stored in Apache Parquet format and is partitioned. For more information about optimizing your Athena queries, see Top 10 Performance Tuning Tips for Amazon Athena.

You deploy an AWS CloudFormation stack with all the infrastructure required to deploy two Amazon Elastic Compute Cloud (Amazon EC2) instances in a private subnet in an Amazon Virtual Private Cloud (Amazon VPC): one instance is used for Microsoft Power BI Desktop, and the other is used for the Microsoft Power BI on-premises data gateway. This stack uses t3.2xlarge instances because they have the minimal hardware requirements recommended. You can increase or decrease the EC2 instance type depending on the performance of the gateway.

Additionally, the CloudFormation template creates an AWS Glue table that gives you access to the dataset. It creates an AWS Lambda function as an AWS CloudFormation custom resource that updates all the partitions in the AWS Glue table.

Then, you use AWS Systems Manager Session Manager (see Starting a session (Systems Manager console) and any remote desktop client to configure the instances and to create your dashboard by following these steps:

  1. Deploy the CloudFormation stack by choosing Launch stack:
  2. On the Amazon EC2 instance that has the tag PowerBiDesktop, install and configure the Simba Athena ODBC driver and Microsoft Power BI Desktop.
  3. Create your dashboard on Microsoft Power BI Desktop and publish it.
  4. On the Amazon EC2 instance that has the tag PowerBiGateway, install and configure the Simba Athena ODBC driver and Microsoft Power BI on-premises data gateway.
  5. Open Microsoft Power BI and configure your Athena data source.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Logging in to your Microsoft Power BI Desktop instance

To reduce the surface of attack of a bastion host, the operational burden to manage bastion hosts, and the additional costs incurred, Session Manager allows you to securely connect to your EC2 instances without the need to run and operate your own bastion hosts or run SSH on your EC2 instances. For more information, see New – Port Forwarding Using AWS System Manager Session Manager.

Connect to the Microsoft Power BI Desktop instance with Session Manager. You need to run the following commands depending on the OS of your local machine. It can take a few minutes after deployment for your instance to be available.

For Linux and Mac OS, enter the following code:

# find the instance ID based on Tag Name
INSTANCE_ID=$(aws ec2 describe-instances \
--filter "Name=tag:aws:cloudformation:logical-id,Values=PowerBiDesktop" \
--query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" \
--output text --region us-east-1)
# create the port forwarding tunnel
aws ssm start-session --target $INSTANCE_ID \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["3389"],"localPortNumber":["8889"]}' \
--region us-east-1

 For Windows, enter the following code:

# find the instance ID based on Tag Name
for /f %i in ('aws ec2 describe-instances --filter "Name=tag:aws:cloudformation:logical-id,Values=PowerBiDesktop" --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" --output text --region us-east-1') do set INSTANCE_ID=%i
# create the port forwarding tunnel
aws ssm start-session --target %INSTANCE_ID% --document-name AWS-StartPortForwardingSession --parameters "portNumber"=["3389"],"localPortNumber"=["8889"] --region us-east-1

Open your remote desktop application and connect to the Microsoft Power BI Desktop EC2 instance. You need the following information:

Installing and configuring Microsoft Power BI Desktop

To install and configure Microsoft Power BI Desktop, complete the following steps:

  1. Download and install the latest Athena ODBC driver for Windows 64-bit.
  2. Choose the ps1 script located in the desktop (right-click) and choose Run with Powershell.

This creates a new data source on Windows called taxiconnection.

  1. Download and install the Microsoft Power BI Desktop.
  2. Open the Microsoft Power BI Desktop application.

Creating an Athena connection on Microsoft Power BI Desktop

To create your Athena connection, complete the following steps:

  1. Open Microsoft Power BI Desktop.
  2. Choose Get Data and More.
  3. Search for and select Amazon Athena.
  4. For Data Source Name (DSN), enter taxiconnection.
  5. Choose DirectQuery.

If you choose Import mode, you can’t create the dashboard because Microsoft Power BI Desktop tries to download all data locally on your computer, and the dataset never finishes loading.

  1. Choose OK.
  2. Choose Use Data Source Configuration.
  3. Choose Connect.
  4. In the AwsDataCatalog folder, navigate to the nyctaxi folder.
  5. Choose the records.
  6. Choose Load.

Creating your dashboard on Microsoft Power BI Desktop and publishing it

You can create a dashboard to show the number of transactions by month and type in descending order. You can then publish the structure of this report to make it available on Microsoft Power BI.

  1. In the Visualizations pane, choose the Stacked bar chart
  2. In the Fields pane, drag the month field to the Axis section in the Visualizations
  3. Drag the type field to the Legend section in the Visualizations
  4. Drag the pickup_datetime field to the Value section in the Visualizations

The following screenshot shows your visualization.

  1. Choose Publish.

Because this is a new report, you’re prompted to save it before you can publish it.

  1. Give your report a name (such as taxireport), and choose Save.
  2. Sign in to be able to publish your report.
  3. Choose a destination (such as My workspace).
  4. In the Success window, choose Got it.

After this last step, the report structure is published on Microsoft Power BI. However, if you try to see the report, there isn’t any data on it because the data isn’t published with the report. You need to install the Microsoft Power BI on-premises data gateway to be able to pull the data.

Logging in to your Microsoft Power BI on-premises data gateway instance

As you did with the Microsoft Power BI Desktop, you log in to the Microsoft Power BI Gateway instance using the tags to get its IDs and a different local port.

For Linux and Mac OS, enter the following code:

# find the instance ID based on Tag Name
INSTANCE_ID=$(aws ec2 describe-instances \
--filter "Name=tag:aws:cloudformation:logical-id,Values=PowerBiGateway" \
--query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" \
--output text --region us-east-1)
# create the port forwarding tunnel
aws ssm start-session --target $INSTANCE_ID \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["3389"],"localPortNumber":["8899"]}' \
--region us-east-1

For Windows, enter the following code:

# find the instance ID based on Tag Name
for /f %i in ('aws ec2 describe-instances --filter "Name=tag:aws:cloudformation:logical-id,Values=PowerBiGateway" --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" --output text --region us-east-1') do set INSTANCE_ID=%i
# create the port forwarding tunnel
aws ssm start-session --target %INSTANCE_ID% --document-name AWS-StartPortForwardingSession --parameters "portNumber"=["3389"],"localPortNumber"=["8899"] --region us-east-1

Open your remote desktop application and connect to the Microsoft Power BI Gateway Amazon EC2 instance with the following information:

Installing and configuring Microsoft Power BI on-premises data gateway

To set up your on-premises data gateway, complete the following steps:

  1. Download and install the latest Athena ODBC driver for Windows 64-bit.
  2. Choose the CongigureODBC.ps1 script located in the desktop (right-click) and choose Run with Powershell.

This creates a new data source on Windows called taxiconnection.

  1. Download the Microsoft Power BI on-premises data gateway standard mode and launch the installer. You might need to update to the latest available .NET version before starting the installation.
  2. For your gateway, choose On-premises data gateway (recommended).
  3. Accept the default values and choose Install.
  4. When the installer asks you to sign in, enter the email address associated with the admin account for the Microsoft Power BI Pro tenant.
  5. Choose Sign in.
  6. If asked to register a new gateway or migrate, restore, or take over an existing gateway, choose Register a new gateway.
  7. Give your gateway a name and provide a recovery key.
  8. Choose Configure.

You should see a green checkmark indicating the gateway is online and ready to be used.

Opening Microsoft Power BI and configuring your Athena data source

To configure your data source, complete the following steps:

  1. Open Microsoft Power BI in your browser.
  2. Choose the Settings icon.
  3. Choose Manage gateways.
  4. Find the gateway cluster you just created.
  5. Hover your mouse over your gateway name and choose the …icon.
  6. Choose Open menu.
  7. Choose Add data source.
  8. For the data source name, enter taxiconnection.
  9. For the data source type, choose Amazon Athena.
  10. For the second data source name, enter taxiconnection.
  11. On the Authentication Method, choose Anonymous, then choose Add.

Seeing your report on Microsoft Power BI

To view your report, complete the following steps:

  1. Choose the workspace where you saved your report.
  2. On the Datasets + dataflows tab, locate the dataset, locate the dataset that has the same name as your report (for example, taxireport) and choose the … icon.
  3. Choose Settings.
  4. Choose Discover Data Sources.
  5. Expand the Gateway Connection
  6. Choose your gateway.
  7. For Maps to, choose taxiconnection.
  8. Choose Apply.
  9. Return to the workspace where you saved your report.
  10. On the Content tab, choose your report (taxireport).

You can now see your report online using the most recent data.

Cleaning up

To avoid incurring future charges, delete the CloudFormation stack and the S3 bucket that you deployed as part of this post.

Conclusion

This post presented how to connect to Athena from Microsoft Power BI using the out-of-the-box data source connector and import data using DirectQuery mode. The first part of the post described the architecture components and how to successfully create a dashboard using the NYC taxi dataset. The stack deployed uses only one EC2 instance for the Microsoft Power BI on-premises data gateway, but in production, you should consider creating a high-availability cluster of gateway installations, ideally in different Availability Zones. The second part of this post deployed a demo environment and walked you through the steps to configure Microsoft Power BI with Athena to share your insights. For native access to your data in AWS without any downloads or servers, be sure to also check out Amazon QuickSight.


About the Authors

Armando Segnini is a Data Architect with AWS Professional Services. He spends his time building scalable big data and analytics solutions for AWS Enterprise and Strategic customers. Armando also loves to travel with his family all around the world and take pictures of the places he visits.

 

 

Xavier Naunay is a Data Architect with AWS Professional Services. He is part of the AWS ProServe team, helping enterprise customers solve complex problems using AWS services. In his free time, he is either traveling or learning about technology and other cultures.