AWS Big Data Blog

Configure seamless single sign-on with SQL analytics in Amazon SageMaker Unified Studio

Amazon SageMaker Unified Studio provides a unified experience for using data, analytics, and AI capabilities. SageMaker Unified Studio now supports trusted identity propagation (TIP) for SQL workloads, enabling fine-grained data access control based on individual user identities. Organizations can use this integration to manage data permissions through AWS Lake Formation while using their existing single sign-on (SSO) infrastructure.

Organizations already using Amazon Redshift with TIP can extend their existing Lake Formation permissions to SageMaker Unified Studio. Users simply log in through SSO and access their authorized data using the SQL editor, maintaining consistent security controls across their analytics environment.

This post demonstrates how to configure SageMaker Unified Studio with SSO, set up projects and user onboarding, and access data securely using integrated analytics tools.

Solution overview

For our use case, a retail corporation is planning to implement sales analytics to identify sales patterns and product categories that are doing well. This will help the sales team improve on sales planning with targeted promotions and help the finance team plan budgeting with better inventory management. The corporation stores a customer table in an Amazon Simple Storage Service (Amazon S3) data lake and a store_sales table in a Redshift cluster.

The corporation uses SageMaker Unified Studio as the UI, with users onboarded from their identity provider (IdP) to AWS IAM Identity Center with TIP. Amazon SageMaker Lakehouse centralizes data from Amazon S3 and Amazon Redshift, and Lake Formation provides fine-grained access control based on user identity. For our example use case, we explore two different users. The following table summarizes their roles, the tools they use, and their data access.

User Group Tool Data Access
Ethan (Data Analyst) Sales Amazon Athena for interactive SQL analysis Non-sensitive customer data (id, c_country, birth_year) and store_sales full table access
Frank (BI Analyst) Finance Amazon Redshift for reports and visualization US customer data (c_country='US')

The following diagram illustrates the solution architecture.

SageMaker Unified Studio with IAM Identity Center simplifies the user journey from authentication to data analysis. The workflow consists of the following steps:

  1. Users sign in with organizational SSO credentials through their IdP and are redirected to SageMaker Unified Studio.
  2. Users configure IAM Identity Center authentication for Amazon Redshift, linking identity management with data access.
  3. Users access the query editor for Amazon Redshift or SageMaker Lakehouse, triggering IAM Identity Center federation to generate session and access tokens.
  4. SageMaker Unified Studio retrieves user authorization details and group membership using the session token.
  5. Users are authenticated as IAM Identity Center users, ready to explore and analyze data using Amazon Redshift and Amazon Athena.

To implement our solution, we walk through the following high-level steps:

  1. Set up SageMaker Lakehouse resources.
  2. Create a SageMaker Unified Studio domain with SSO and TIP enabled.
  3. Configure Amazon Redshift for TIP and validate access.
  4. Validate data access using Amazon Athena.

Prerequisites

Before you begin implementing the solution, you must have the following in place:

  1. If you don’t have an AWS account, you can sign up for one.
  2. We provide utility scripts to help set up various sections of the post. To use them:
    1. Right-click this link and save the utility scripts zip file.
    2. Unzip the file to a terminal that has the AWS Command Line Interface (AWS CLI) configured. You can also use AWS CloudShell.
    3. Run the scripts only when prompted in the relevant sections.
    Note: The utility scripts are configured for us-east-1 region. If you prefer another region, edit the region in the scripts before running them.
  3. To deploy the infrastructure, right-click this link and select ‘Save Link As’ to save it as sagemaker-unified-studio-infrastructure.yaml. Then upload the file when creating a new stack in the AWS CloudFormation console, which will create the following resources:
    1. An S3 bucket to hold the customer data used in this post.
    2. An AWS Identity and Access Management (IAM) role called DataTransferRole with permissions as defined in Prerequisites for managing Amazon Redshift namespaces in the AWS Glue Data Catalog.
    3. An IAM role called IAMIDCRedshiftRole, which will be used later to set up the IAM Identity Center Redshift application.
    4. An IAM role called LakeFormationRegistrationRole, following the instructions in Requirements for roles used to register locations, and necessary IAM policies.
  4. If you don’t have a Lake Formation user, you can create one. For this post, we use an admin user. For instructions, see Create a data lake administrator.
  5. If IAM Identity Center is not enabled, refer to Enabling AWS IAM Identity Center for instructions to enable it.
    1. If you need to migrate existing Redshift users and groups, use the IAM Identity Center Redshift migration utility.
    2. For a quick way to test the feature and familiarize yourself with the process, we provide a script to generate mock users and groups. Run the setup-idc.sh script, which is provided in Step 2, to create test users and groups in IAM Identity Center for demonstration purposes.
  6. Integrate IAM Identity Center with Lake Formation. For instructions, see Connecting Lake Formation with IAM Identity Center.
  7. Register the S3 bucket as a data lake location:
    1. On the Lake Formation console, choose Data lake locations in the navigation pane.
    2. Choose Register location.
    3. For the role, use LakeFormationRegistrationRole.
  8. Create an IAM Identity Center Redshift application, as detailed in our previous post:
    1. On the Amazon Redshift console, choose IAM Identity Center connections in the navigation pane and choose Create application.
    2. For both the display name and application name, enter redshift-idc-app.
    3. Set the IdP namespace to awsidc.
    4. Choose IAMIDCRedshiftRole as the IAM role.
    5. Choose Next to create the application.
    6. Take note of the application Amazon Resource Name (ARN) to use in subsequent steps. The ARN format is arn:aws:sso::<ACCOUNT_NUMBER>:application/ssoins-<RANDOM_STRING>/apl-<RANDOM_STRING>.
  9. If you don’t have existing Redshift tables to work with, run the script setup-producer-redshift.sh, which is provided in Step 2, to create a producer namespace and workgroup, set up a sample sales database, and generate necessary tables with test data.
  10. The post also uses simulated customer data stored in the AWS Glue Data Catalog. To set up this data and configure the necessary Lake Formation permissions, run the setup-glue-tables-and-access.sh script provided in Step 2.

Set up SageMaker Lakehouse resources

In this section, we configure the foundational lakehouse resources required for SageMaker to access and analyze data across multiple storage systems. We’ll register the Redshift instance to the AWS Glue Data Catalog to make warehouse data discoverable and establish Lake Formation permissions on lakehouse resources for user identities to ensure secure, governed access to both data lake and data warehouse resources from within SageMaker environments.

Register Redshift instance to the Data Catalog

In this step, we use the store_sales data, which we created earlier using the setup-producer-redshift.sh script. You can register entire clusters to the Data Catalog and create catalogs managed by AWS Glue. To register a cluster to the Data Catalog, complete the following steps:

  1. On the Lake Formation console, choose Administrative roles and tasks in the navigation pane.
  2. Under Data lake administrators, choose Add.
  3. Choose Read-only administrator, then choose AWSServiceRoleForRedshift.
  4. On the Amazon Redshift console, open your namespace.
  5. On the Actions dropdown menu, chose Register with AWS Glue Data Catalog, then choose Register.
  6. Sign in to the Lake Formation console as the data lake administrator and choose Catalogs in the navigation pane.
  7. Under Pending catalog invitations, select the namespace and accept the invitation by choosing Approve and create catalog.
  8. Provide the name for the catalog as salescatalog.
  9. Select Access this catalog from Apache Iceberg compatible engines, choose DataTransferRole for the IAM role, then choose Next.
  10. Choose Add permissions and choose the admin IAM role under IAM users and roles.
  11. Select Super user for catalog permissions and choose Add.
  12. Choose Next.
  13. Choose Create catalog.

Set up Lake Formation permission on lakehouse resources for user identities

In this section, we configure Lake Formation permissions to enable secure access to lakehouse resources for federated user identities. Lake Formation provides fine-grained access control that works seamlessly with IAM Identity Center, allowing you to manage permissions centrally while maintaining security boundaries.

We’ll focus on granting database access to IAM Identity Center groups in Lake Formation and setting table-level permissions for federated Redshift catalog tables. These permissions form the security foundation for our federated query architecture, enabling users to seamlessly access both S3 data lake and Redshift data warehouse resources through a unified interface.

Grant database access to IAM Identity Center groups in Lake Formation

After you share your Redshift catalog with the Data Catalog and integrate with Lake Formation, you must grant appropriate database access. Follow these steps to set up permissions on your data lake resources for corporate identities:

  1. On the Lake Formation console, under Permissions in the navigation pane, choose Data permissions.
  2. Choose Grant.
  3. Select Principals for Principal type.
  4. Under Principals, select IAM Identity Center and choose Add.
  5. In the pop-up window, if this is your first time assigning users and groups, choose Get started.
  6. Search for and select the IAM Identity Center groups awssso-sales and awssso-finance.
  7. Choose Assign.
  8. Under LF-Tags or catalog resources, choose Named Data Catalog resources.
    1. Choose <accountid>:salescatalog/dev for Catalogs.
    2. Choose sales_schema for Database.
  9. Under Database permissions, select Describe.
  10. Choose Grant to apply the permissions.

Grant table-level permissions for federated Redshift catalog tables

Complete the following steps to grant table permissions to the IAM Identity Center groups:

  1. On the Lake Formation console, under Permissions in the navigation pane, choose Data permissions.
  2. Choose Grant.
  3. Select Principals for Principal type.
  4. Under Principals, select IAM Identity Center and choose Add.
  5. In the pop-up window, if this is your first time assigning users and groups, choose Get started.
  6. Search for and select the IAM Identity Center group awssso-sales.
  7. Choose Assign.
  8. Under LF-Tags or catalog resources, choose Named Data Catalog resources.
    1. Choose <accountid>:salescatalog/dev for Catalogs.
    2. Choose sales_schema for Database.
    3. Choose store_sales for Table.
  9. Select Select and Describe for Table permissions.
  10. Choose Grant to apply the permissions.

Create a SageMaker Unified Studio domain with SSO and TIP enabled

For instructions to create a SageMaker Unified Studio domain, refer to Create an Amazon SageMaker Unified Studio domain – quick setup. Because your IAM Identity Center integration is already complete, you can specify an IAM Identity Center user in the domain configuration settings.

Enable TIP in SageMaker Unified Studio

Complete the following steps to enable TIP in SageMaker Unified Studio:

  1. On the SageMaker console, use the AWS Region selector in the top navigation bar to choose the appropriate Region.
  2. Choose View domains and choose the domain’s name from the list.
  3. On the domain’s details page, on the Project profiles tab, choose a project profile, for example, SQL analytics.
  4. Select SQL analytics and choose Edit.
  5. In the Blueprint parameters section, select enableTrustedIdentityPropagationPermissions and choose Edit.
  6. Update the value as true.
  7. To enforce authorization-based on TIP, the SageMaker Unified Studio admin can make this parameter non-editable.
  8. Choose Save.

Enable user access for SageMaker Unified Studio domain

Complete the following steps to enable user access for the SageMaker Unified Studio domain:

  1. Open the SageMaker console in the appropriate Region and choose Domains in the navigation pane.
  2. Choose an existing SageMaker Unified Studio domain where you want to add SSO user access.
  3. On the domain’s details page, on the User management tab, in the Users section, choose Add and Add SSO users and groups.
  4. Choose the user (for this post, we add the user Frank) from the dropdown list and choose Add users and groups.

Add project members

SageMaker Unified Studio projects facilitate team collaboration for different business initiatives. As the project owner, Ethan now can add Frank as a team member to enable their collaboration. To add members to an existing project, complete the following steps:

  1. Sign in to the SageMaker Unified Studio console using the SSO credentials of who owns the project (for this post, Ethan).
  2. Choose Select a project.
  3. Choose the project you want to edit.
  4. On the Project overview page, expand Actions and choose Manage members.
  5. Choose Add members.
  6. Enter the name of the user or group you want to add (for this post, we add Frank).
  7. Select Contributor if you want to add the project member as a contributor.
  8. (Optional) Repeat these steps to add more project members. You can add up to eight project members at a time.
  9. Choose Add members.

Create a SQL analytics project in Unified Studio

In this step, we federate into SageMaker Unified Studio and create a project using SQL analytics. Complete the following steps:

  1. Federate into SageMaker Unified Studio using your IAM Identity Center credentials:
    1. On the SageMaker console, choose Domains in the navigation pane.
    2. Copy the SageMaker Unified Studio URL for your domain and enter it into a new browser window.
    3. Choose Sign in with SSO.
    4. A browser pop-up will redirect you to your preferred IdP login page, where you enter your IdP credentials.
    5. If authentication if successful, you will be redirected to SageMaker Unified Studio.
  2. After logging in, choose Create project.
  3. Enter a name for your project. This project name is final and can’t be changed later.
  4. (Optional) Enter a description for your project. You can edit this later.
  5. Choose a project profile. For this demo, we choose the SQL analytics profile from the available templates.
  6. Leave the default values as they are or modify them according to your use case, then choose Continue.
  7. Choose Create project to finalize the project and initialize your SQL analytics workspace.

For more detailed information and advanced configurations, refer to Create a project.

Configure Amazon Redshift for TIP and validate access

Run the setup-consumer-redshift.sh script (provided in the prerequisites). This script will create a new namespace and workgroup and add the required tags, which you will use later to integrate with SageMaker Unified Studio compute.

If you are creating the cluster manually, add one of the following tags to the Redshift cluster or workgroup that you want to add to SageMaker Unified Studio:

  • Option 1 – Add a tag to allow only a specific SageMaker Unified Studio project to access it: AmazonDataZoneProject=<projectID>
  • Option 2 – Add a tag to allow all SageMaker Unified Studio projects in this account to access it: for-use-with-all-datazone-projects=true

Create compute using IAM Identity Center authentication

After you set up your project, the next step is to establish a compute resource connection on the SageMaker Unified Studio console. Follow these steps to add either Amazon Redshift Serverless or a provisioned cluster to your project environment:

  1. Go to the Compute section of your project in SageMaker Unified Studio.
  2. On the Data warehouse tab, choose Add compute.
  3. You can create a new compute resource or choose an existing one. For this post, we choose Connect to existing compute resources, then choose Next.
  4. Choose the type of compute resource you want to add, then choose Next. For this post, we choose Redshift Serverless.
  5. Under Connection properties, provide the JDBC URL or the compute you want to add, which is integrated with IAM Identity Center. If the compute resource is in the same account as your SageMaker Unified Studio project, you can select the compute resource from the dropdown menu. In our example, we use the consumer account that was just provisioned.
  6. Under Authentication, select IAM Identity Center.
  7. For Name, enter the name of the Redshift Serverless or provisioned cluster you want to add.
  8. For Description, enter a description of the compute resource.
  9. Choose Add compute.

The SageMaker Unified Studio Project Compute and Data pages will now display information for that resource.

If everything is configured correctly, your compute will be created using IAM Identity Center. Because your IdP credentials are already cached while you’re logged in to SageMaker Unified Studio, it uses the same credentials and creates the compute.

Test data access using Amazon Redshift

When Ethan logs in to SageMaker Unified Studio using IAM Identity Center authentication, he successfully federates and can access customer data from all countries but only for non-sensitive columns. Let’s connect to Amazon Redshift in SageMaker Unified Studio by following these steps:

  1. Choose Actions and choose Open Query editor.
  2. Choose Redshift in the Data explorer pane.
  3. Run the customer sales calculation query to observe that user Ethan (a data analyst) can access customer data from all countries but only non-sensitive columns (id, birth_country, product_id):
    select current_user, c.*, sum(s.sales_amount) as total_sales
    from "awsdatacatalog"."customerdb"."customer" c
    join "dev@salescatalog"."sales_schema"."store_sales" s 
    on c.id=s.id
    group by all;

You have successfully configured Redshift to use IAM Identity Center authentication in SageMaker Unified Studio.

Validate data access using Amazon Athena

When Frank logs in to SageMaker Unified Studio using IAM Identity Center authentication, he successfully federates and can access customer data only for the United States. To query with Athena, complete the following steps:

  1. Choose Actions and choose Open Query editor.
  2. Choose Lakehouse in the Data explorer pane.
  3. Explore AwsDataCatalog, expand the database, choose the respective table, and on the options menu (three dots), choose Preview data.

The following demonstration illustrates how user Frank, a BI analyst, can perform SQL analysis using Athena. Due to row-level filtering implemented through Lake Formation, Frank’s access is restricted to customer data from the United States only. Additionally, you can observe that in the Data explorer pane, Frank can only view the customerdb database. The dev@salescatalog database is not visible to Frank because no access has been granted to his respective group from Lake Formation.

The IAM Identity Center authentication integration is complete; you can use both Amazon Redshift and Athena through SageMaker Unified Studio in a simplified, all-in-one interface.Note that, at the time of writing, Athena doesn’t work with Redshift Managed Storage (RMS).

Clean up

Complete the following steps to clean up the resources you created as part of this post:

  1. Delete the data from the S3 bucket.
  2. Delete the Data Catalog objects.
  3. Delete the Lake Formation resources and Athena account.
  4. Delete the SageMaker Unified Studio project and associated domain.
  5. If you created new Redshift cluster for testing this solution, delete the cluster.

Conclusion

In this post, we provided a comprehensive guide to enabling trusted identity propagation within SageMaker Unified Studio. We covered the setup of a SageMaker Unified Studio domain with SSO, the creation of tailored projects, efficient user onboarding with appropriate permissions, and the management of AWS Glue and Amazon Redshift managed catalog permissions using Lake Formation. Through practical examples, we demonstrated how to use both Amazon Redshift and Athena within SageMaker Unified Studio, showcasing secure data access and analysis capabilities. This approach helps organizations maintain strict identity controls while helping data scientists and analysts derive valuable insights from both data lake and data warehouse environments, supporting both security and productivity in machine learning workflows.

For more information on this integration, refer to Trusted identity propagation.


About the authors

Maneesh Sharma

Maneesh Sharma

Maneesh is a Sr. Architect at AWS with 15 years of experience designing and implementing large-scale data warehouse and analytics solutions. He works closely with customers to help them modernize their legacy applications to AWS cloud-based platforms.

Srividya Parthasarathy

Srividya Parthasarathy

Srividya is a Senior Big Data Architect with Amazon SageMaker Lakehouse. She works with the product team and customers to build robust features and solutions for their analytical data platform. She enjoys building data mesh solutions and sharing them with the community.

Arun A K

Arun A K

Arun is a Senior Big Data Specialist Solutions Architect at Amazon Web Services. He helps customers design and scale data platforms that power innovation through analytics and AI. Arun is passionate about exploring how data and emerging technologies can solve real-world problems. Outside of work, he enjoys sharing knowledge with the tech community and spending time with his family.