AWS Big Data Blog

Power up your analytics with Amazon SageMaker Unified Studio integration with Tableau, Power BI, and more

Organizations face challenges in accessing and analyzing governed data across multiple sources through their preferred business intelligence (BI) and analytics tools while maintaining security and governance. They need a seamless way to connect their familiar tools (like Tableau, Power BI, Excel) to Amazon SageMaker‘s data assets without compromising data governance and security protocols.

Amazon SageMaker supports authentication through the Amazon Athena JDBC driver, allowing data users to query their subscribed data lake assets via popular BI and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. This integration empowers data users to access and analyze governed data within Amazon SageMaker using familiar tools, boosting both productivity and flexibility.

Customers use Amazon SageMaker Unified Studio to streamline data access and governance by enabling data users to locate and subscribe to data from multiple sources within a single project. Amazon SageMaker Unified Studio natively integrates with Amazon-specific options like Amazon AthenaAmazon Redshift, and Amazon SageMaker AI, allowing users to analyze their project governed data. With this launch of JDBC connectivity, Amazon SageMaker Unified Studio expands its support for data users, including analysts and scientists, allowing them to work in their preferred tools, whether it’s SQL Workbench, Domino, or Amazon-native solutions like Amazon Athena, while ensuring secure, governed access within Amazon SageMaker Unified Studio.

Getting Started

To get started, download and install the latest Athena JDBC driver for your tool of choice. After installation, copy the JDBC connection string from the Amazon SageMaker Unified Studio portal into the JDBC connection configuration to establish a connection from your tool. This directs you to authenticate using single sign-on (SSO) with your corporate credentials. After connecting, you can query, visualize, and share data—governed by Amazon SageMaker Unified Studio–within the tools you already know and trust.

In this post, we guide you through connecting various analytics tools to Amazon SageMaker Unified Studio using the Athena JDBC driver, enabling seamless access to your subscribed data within your Amazon SageMaker Unified Studio projects.

Solution overview

To demonstrate these capabilities, consider a use case where your marketing team wants to analyze sales data to understand patterns in sales by stores and sales representatives. To achieve this, your marketing team needs access to sales_performance_by_store, and sales_performance_by_rep data owned by the sales team. The sales team, acting as the data producer, publishes the necessary data assets to Amazon SageMaker Unified Studio, allowing the marketing team, as a consumer, to discover and subscribe to these assets.

After the subscription is approved, the data assets become available within the marketing team’s project environment in Amazon SageMaker Unified Studio. The marketing team can then use their preferred tool to perform data exploration. An example architecture of how this is done using DBeaver is shown in the following image:

SageMaker Unified Studio project architecture diagram showing data collaboration between Sales and Marketing teams with Amazon S3 storage and Athena integration

Prerequisites

To follow along with this post, you need the following prerequisites in place:

  1. AWS account – If you don’t have an active AWS account, see How do I create and activate a new AWS account?.
  2. Amazon SageMaker resources – You need a domain for Amazon SageMaker, and two Amazon SageMaker project.
  3. Publish data assets – As the data producer from the sales team, you can now ingest individual data assets into Amazon SageMaker Unified Studio. For this use case, create a data source and import the technical metadata of two data assets – sales_performance_by_store, and sales_performance_by_rep – from AWS Glue Data Catalog. Ensure the data assets are enriched with business descriptions and published to the catalog.
    Note: Here we are using tables which are in the Glue catalog but with Sagemaker Lakehouse you have the option to bring assets from other sources.
  4. Subscribe data assets – As a data analyst from the marketing team, you can now discover and subscribe to the data assets. The data producer from the retail team reviews and approves your subscription. Upon successful fulfillment, the data assets are added to your SageMaker Unified project.

For detailed instructions for publishing and subscribing, see the Amazon SageMaker Unified Studio User Guide.

The following figure shows the subscribed assets added to the subscribed assets section in your marketing project catalog.

SageMaker Unified Studio Assets page displaying subscribed data assets with accessibility status indicators

In the following sections, we walk you through the steps to configure DBeaver to consume the subscribed assets from Amazon SageMaker Unified Studio.

Configuring DBeaver to access subscribed data assets

In this section, you configure DBeaver to access the subscribed assets from the Marketing project

To configure DBeaver:

  1. Connect with JDBC: In the Amazon SageMaker Unified Studio, (1) open the Marketing project, (2) on the Project overview screen, (3) choose JDBC connection details tab.
    SageMaker Unified Studio Project overview page showing JDBC connection parameters for external application integration
  2. Copy the JDBC connection URL into a text editor. The URL should have the following parameters needed for configuring the database connection in DBeaver – Domain ID, Environment ID, Region, and IDC Issuer URL.
    JDBC connection details configuration panel with IDC authentication parameters and copy functionality
  3. Download and install the latest Athena driver:
    • If DBeaver has the Athena driver pre-installed, it might be the older (v2) version. To ensure compatibility with Amazon SageMaker Unified Studio, you need the latest driver (v3), which includes the necessary authentication features.
    • Download the latest JDBC driver—version 3.x.
    • To install the latest driver:
      • Go to Database and then to Driver Manager in DBeaver.
      • Select the Athena driver and choose Edit.
      • Visit the Libraries tab.
      • Choose Download/Update to fetch the latest driver version.
      • If prompted, select the appropriate version and confirm the download.
  4. In the DBeaver SQL client, create a new database connection and select the Athena driver.
    DBeaver database connection dialog showing Amazon Athena driver selection among available database options
  5. Switch to the Driver Properties tab, enter the values of the following properties that are available in the JDBC connection URL you copied from Amazon SageMaker Unified Studio. If any of these properties are not already available, you can add them and provide their respective values.
    • CredentialsProvider: The credentials provider to authenticate requests to AWS
    • DataZoneDomainId: The ID of your Amazon DataZone domain
    • DataZoneDomainRegion: The AWS Region where your domain is hosted
    • DataZoneEnvironmentId: The ID of your DefaultDataLake environment
    • IdentityCenterIssuerUrl: The issuer URL used by AWS Identity and Access Management (IAM) Identity Center for token issuance
    • OutputLocation: Amazon S3 path for storing query results
    • Region: The Region where the environment is created
    • Workgroup: Amazon Athena workgroup of the environment
    • ListenPort: Pick any four digits port number. This is the port number that listens for the IAM Identity Center response

    DBeaver connection configuration dialog for Amazon Athena with driver properties and authentication settings

  6. Choose Test Connection….
  7. You are redirected to the IAM Identity Center sign-in portal. Sign in with Marketing user credentials. If you’re already signed in through single sign-on (SSO), this step can be skipped.
    AWS authentication sign-in page with username input field
  8. After you sign in, if you are prompted to authorize the DataZoneAuthPlugin. Choose Allow access to authorize access to Amazon DataZone from DBeaver.
    AWS DataZone authorization dialog requesting user permission for application access
  9. After sign in completes, you see the following message. You can close the window and go to the DBeaver.
    Amazon DataZone session completion confirmation message
  10. After the connection is established, the following success message appears.
    DBeaver connection test dialog showing successful Amazon Athena connection with performance metrics
  11. You can now view and query all subscribed assets directly within DBeaver.
    DBeaver SQL query interface displaying sales performance data from Amazon Athena database

These steps might also apply to other analytics tools and clients that support JDBC connections. If you’re using a different tool, you might need to adapt these instructions accordingly to ensure proper configuration and access to Amazon SageMaker Unified Studio data assets.

Integration with other applications

You can use similar steps for other BI and analytics tools that support standard database connections.

Connect to Tableau Desktop

Use the Athena JDBC driver to connect Tableau to Amazon SageMaker Unified Studio and visualize your subscribed data.To connect to Tableau Desktop:

  1. Make sure that you’re using the latest Athena JDBC 3.x driver.
  2. Copy the JDBC driver file and place it in the appropriate folders for your operating system
    • For Mac OS: ~/Library/Tableau/Drivers
    • For Windows: C:\Program Files\Tableau\Drivers
  3. Open Tableau Desktop. From the To a Server connection menu, select Other Databases (JDBC) to connect to Amazon SageMaker Unified Studio.
    Tableau start page showing connection options with Other Databases JDBC option highlighted
  4. Paste the JDBC connection URL you copied from the SageMaker Unified Studio portal into the URL. Leave other fields such as DialectUsername, and Password blank and choose Sign in.
    If you get a port is occupied error – add “;ListenPort=8055” to the URL to change the port. You can use any port number.

    Tableau Other Databases JDBC connection dialog with PostgreSQL dialect configuration

  5. This redirects you to authenticate with IAM Identity Center. Enter the credentials of the Identity Center user that you used to sign in to the SageMaker Unified Studio portal. Authorize the DataZoneAuthPlugin to access Amazon DataZone from Tableau. Once the connection is established with the success message, you can view your project’s subscribed data directly within Tableau and build dashboards.
    Data analytics interface showing sales_performance_by_store table with 283 rows and 15 fields

Connect to Microsoft Power BI

Now, we look at connecting Amazon SageMaker Unified Studio with Microsoft Power BI on Windows.While Amazon Athena provides a native ODBC driver for connecting to ODBC-compatible tools like Microsoft Power BI, it currently doesn’t support Amazon SageMaker Unified Studio authentication. Therefore, in this post, we use an ODBC-JDBC bridge to connect Amazon SageMaker Unified Studio with Microsoft Power BI using the Athena JDBC driver, which supports SageMaker Unified Studio authentication.

In this post, we’re using the ZappySys driver as the ODBC-JDBC bridge. This is a third-party solution that requires a separate licensing fee, which isn’t included in the AWS solution. You can choose to use any other solution for ODBC-JDBC bridge.To connect to Power BI:

  1. Make sure that you have administrator privileges to run the ODBC Data Source Administrator.
  2. From the Windows Start menu, run the ODBC Data Source Administrator (the 64-bit version) using run as Administrator.
  3. Create a New Data Source with the ZappySys JDBC Bridge Driver. You are prompted to enter your connection details.
    Windows ODBC Data Source Administrator dialog showing ZappySys JDBC Bridge Driver selection
  4. Paste the JDBC URL you copied from the SageMaker Unified Studio portal in the Connection String, along with the driver class and JDBC driver file. Make sure that you’re using the latest Athena JDBC 3.x driver.
  5. Choose Test Connection. A new dialog window pops up after the connection is successful.
    Test Connection using ZappySys JDBC Bridge Driver
  6. This redirects you to authenticate with IAM Identity Center. Enter the credentials of the Identity Center user that you used to sign in to the SageMaker Unified Studio portal. Authorize the DataZoneAuthPlugin.
  7. Choose Preview tab on ZappySys JDBC Bridge Driver window and choose one of the subscribed tables to access data.
    ZappySys JDBC Bridge Driver configuration interface showing SQL query preview with sales performance results
  8. After configuring the data source, launch Power BI. Create a blank report or use an existing report to integrate the new visuals. Choose Get Data and select the name of the data source you created. This opens a new browser window to authenticate your credentials. Allow access to authorize the DataZone Auth plugin. After authorization is complete, you can build your reports in Microsoft Power BI with the subscribed data assets.
    Database connection profile selection dialog with PostgreSQL group highlighted

Connect to SQL Workbench

Discover how SQL Workbench can connect to Amazon SageMaker Unified Studio for users who prefer a SQL interface to query data lake tables and views subscribed through projects in Amazon SageMaker Unified Studio.

To connect to SQL Workbench:

  1. Make sure that you’re using the latest Athena JDBC 3.x driver.
  2. Open SQL Workbench/J and choose Manage Drivers.
    Database driver management interface showing SMUSAthenajDBC driver configuration details
  3. Select the option to add a new driver. Enter a name for it, such as SMUSAthenaJDBC, and import the driver you downloaded in the previous steps.
    Database driver management dialog showing SMUSAthenaJDBC driver configuration with library path and class name
  4. Create a new connection profile and enter a name it, such as smus-profile. In the Driver dropdown, select the driver you configured. For the URL, enter the string jdbc:athena://region=us-east-1; (In the example, the Virginia Region is being used). Choose Extended Properties.
    PostgreSQL connection profile configuration dialog with Amazon Athena JDBC driver settings and authentication options
  5. Under Extended Properties, add the following parameters that you copied from the SageMaker Unified Studio portal. You can also include these parameters in the JDBC (URL) connection string. Choose OK.
    • Workgroup
    • OutputLocation
    • DataZoneDomainId
    • IdentityCenterIssuerURL
    • CredentialsProvider
    • DatazoneEnvironmentId
    • DataZoneDomainRegain

    Alos add “ListenPort” with any port number.

    Extended properties configuration dialog showing AWS DataZone connection parameters including domain ID, environment ID, and listen port 8067

  6. This redirects you to authenticate with IAM Identity Center. Enter the credentials of the Identity Center user that you used to sign in to the SageMaker Unified Studio portal. Authorize the DataZoneAuthPlugin.
  7. After successful connection, in SQL Workbench/J, under Database Explorer, select the database from the marketing project of SageMaker unified studio. Choose a subscribed table. Select the Data tab to see the data in the table.
    SQL Workbench showing sales performance data query results from AWS Athena database with 283 customer transaction records

Cleanup

To ensure no additional charges are incurred after testing, be sure to delete the Amazon SageMaker Unified Studio domain. See Delete domains for instructions.

Conclusion

Amazon SageMaker Unified Studio continues to expand its offerings, providing you with more flexibility to access, analyze, and visualize your subscribed data. With support for the Athena JDBC driver, you can now use a wide range of popular BI and analytics tools, making data accessed through Amazon SageMaker Unified Studio more accessible than ever before. Whether you’re using Tableau, Power BI, or other familiar tools, the integration with Amazon SageMaker Unified Studio ensures that your data remains secure and accessible to authorized users.

The feature is supported in all AWS commercial Regions where Amazon SageMaker Unified Studio is currently available. Get started with our technical documentation.


About the authors

Narendra Gupta

Narendra Gupta

Narendra is a Specialist Solutions Architect at AWS, helping customers on their cloud journey with a focus on AWS analytics services. Outside of work, Narendra enjoys learning new technologies, watching movies, and visiting new places.

Durga Mishra

Durga Mishra

Durga is a solutions architect at AWS. Outside of work, Durga enjoys spending time with family and loves to hike on Appalachian trails and spend time in nature.

Ramesh Singh

Ramesh Singh

Ramesh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon SageMaker team. He is passionate about building high-performance ML/AI and analytics products that help enterprise customers achieve their critical goals using cutting-edge technology.

Nishchai JM

Nishchai JM

Nishchai is an Analytics Specialist Solutions Architect at Amazon Web services. He specializes in building Big-data applications and help customer to modernize their applications on Cloud. He thinks Data is new oil and spends most of his time in deriving insights out of the Data.