Build a centralized granular access control to manage assets and data access in Amazon QuickSight
A large business intelligence (BI) project with many users and teams and sensitive information demands a multi-faceted security architecture. Such architecture should provide BI administrators and architects with the capability to minimize the amount of information accessible to users. For a straightforward solution to manage Amazon QuickSight user and asset access permissions, you can use the AWS Command Line Interface (AWS CLI) or AWS Management Console to manually edit QuickSight user role and dashboard access. However, in specific cases, an enterprise can easily have hundreds or thousands of users and groups, and these access management methods aren’t efficient. We have received a large number of requests to provide an advanced programmable approach to deploy and manage a centralized QuickSight security architecture.
This post describes the best practices for QuickSight authentication and authorization granular access control, and provides a centralized cloud application with an AWS Cloud Development Kit (AWS CDK) stack to download. One of the advantages of our solution is enterprises can deploy the security framework to administer access control of their BI without leaving AWS.
All configurations are saved in the AWS Systems Manager Parameter Store. Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as user name, user permissions, passwords, and database strings as parameter values. You can reference AWS Systems Manager parameters in your scripts and configuration and automation workflows by using the unique name that you specified when you created the parameter.
The AWS CDK application template fits into the continuous integration and continuous deployment (CI/CD) infrastructure and grants or revokes all authentications and authorizations based on a defined policy prescribed by AWS. This avoids possible human errors made by BI developers or administrators. BI developers can edit configuration parameters to release new dashboards to end-users. At the same time, BI administrators can edit another set of parameters to manage users or groups. This AWS CDK CI/CD design bridges the gaps between development and operation activities by enforcing automation in building and deploying BI applications.
In enterprise BI application design, multi-tenancy is a common use case, which serves multiple sets of users with one infrastructure. Tenants could either be different customers of an independent software vendor (ISV), or different departments of an enterprise. In a multi-tenancy design, each tenant shares the dashboards, analyses, and other QuickSight assets. Each user, who can see all other users belonging to the same tenant (for example, when sharing content), remains invisible to other tenants. Within each tenant, the BI admin team has to create different user groups to control the data authorization, including asset access permissions and granular-level data access.
Let’s discuss some use cases of asset access permissions in detail. In a BI application, different assets are usually categorized according to business domains (such as an operational dashboard or executive summary dashboard) and data classification (critical, highly confidential, internal only, and public). For example, you can have two dashboards for analyzing sales results data. The look and feel of both dashboards are similar, but the security classification of the data is different. One dashboard, named Sales Critical Dashboard, contains critical columns and rows of data. The other dashboard, called Sales Highly-Confidential Dashboard, contains highly confidential columns and rows of data. Some users are granted permission to view both dashboards, and others have lower security level permission and can only access Sales Highly-Confidential Dashboard.
In the following use case, we address granular-level data access as follows:
- Row-level access (RLS) – For the users who can access Sales Critical Dashboard, some of them can only view US data. However, some global users can view the data of all countries, including the US and UK.
- Column-level access (CLS) – Some users can only view non-personally identifiable information (PII) data columns of a dataset, whereas the HR team can view all the columns of the same dataset.
Large projects might have several tenants, hundreds of groups, and thousands of users in one QuickSight account. The data leader team wants to deploy one protocol for user creation and authentication in order to reduce the maintenance cost and security risk. The architecture and workflow described in this post help the data leader achieve this goal.
Additionally, to avoid human errors in daily operation, we want these security permissions to be granted and revoked automatically, and fit into the CI/CD infrastructure. The details are explained later in this post.
The following diagram shows the QuickSight account architecture of this solution.
- Authors create dashboards and update AWS Systems Manager Parameter Store to release dashboards to different groups
- Admins approve the requests from authors
- Admins update user management (roles, namespace,) by editing AWS Systems ManagerParameter Store
- DevOps deploy the updates with AWS CDK
*Groups: Object access permission groups control the owner/viewer of the objects. Data segment groups combined with RLS/CLS control data access.
*Datasets: Contain all data, restricted by row-level security (RLS) and column-level security (CLS)
*First time log in QuickSight: If the QuickSight user is not registered before first time log in, a reader is created and this reader only can view the landing page dashboard, which shares to all users of this account. The landing page provides the reports list that this user can view.
The following diagram illustrates the authorization workflow of the architecture.
Authorization diagram details:
- User information (department, team, geographic location) is stored in Amazon Redshift, Amazon Athena, or any other database. Combined with group-user mapping, RLS databases are built for control data access.
- Hourly permissions assignment:
- According to group-employee name (user) mapping (membership.csv) and group-role mapping (/qs/console/roles), an AWS Lambda function creates groups, registers, users, assigns group members, removes group memberships, promotes readers to author or admin, and deletes users if they’re demoted from author or admin to reader.
- According to group-dashboard mapping in /qs/config/access, an AWS Lambda function updates dashboard permissions to QuickSight groups.
- According to group-namespace mapping in membership.csv, an AWS Lambda function creates QuickSight groups in the specified namespace.
- Sample parameters of objects access permissions and data segments:
- Sample parameters of QuickSight user role:
- Sample data of membership.csv:
In this solution, custom namespaces are deployed to support multi-tenancy. The
default namespace is for all internal users of a company (we call it OkTank). OkTank creates the
3rd-Party namespace for external users. If we have to support more tenants, we can create more custom namespaces. By default, we’re limited to 100 namespaces per AWS account. To increase this limit, contact the QuickSight product team. For more information about multi-tenancy, see Embed multi-tenant analytics in applications with Amazon QuickSight.
In each namespace, we create different types of groups. For example, in the
default namespace, we create the
BI-Developer groups for the
author users. For
reader, we deploy two types of QuickSight groups to control asset access permissions and data access: object access permission groups and data segment groups.
The following table summarizes how the object access permission groups control permissions.
||Default||View both dashboards (containing the critical data and highly confidential data)|
||Default||Only view Sales Highly-Confidential Dashboard|
||Default||Account management and edit all assets||Users in the
||Default||Edit all assets||Users in the
||Default||View all assets and create ad hoc analysis to run self-service analytics reports||
Users in the
However, this group can’t save or share their ad hoc reports.
||Non-default namespaces (
||Can only share with readers (
||In non-default namespaces, we can also create other object access permission groups, which is similar to the critical group in the default namespace.|
For more information about QuickSight groups, users, and user roles, see Managing User Access Inside Amazon QuickSight, Provisioning Users for Amazon QuickSight, and Using administrative dashboards for a centralized view of Amazon QuickSight objects.
||Default||Only view US data on any dashboard||Row-level|
||Default||Only view UK data on any dashboard||Row-level|
||Default||View data of all countries on any dashboard||Row-level|
||Default||Can’t view Social Security numbers, annual income, and all other columns of PII data||Column-level|
||Default||Can view all columns including PII data||Column-level|
We can set up similar groups in non-default namespaces.
These different groups can overlap each other. For example, if a user belongs to the groups
PII, they can view US data on both dashboards, with all columns. The following Venn diagram illustrates the relationships between these groups.
In summary, we can define a multi-faceted security architecture by combining QuickSight features, including namespace, group, user, RLS, and CLS. All related configurations are saved in the Parameter Store. The QuickSight users list and group-user mapping information are in an Amazon Simple Storage Service (Amazon S3) bucket as a CSV file (named
membership.csv). This CSV file could be output results of LDAP queries. Several AWS Lambda functions are scheduled to run hourly (you can also invoke these functions on demand, such as daily, weekly, or any time granularity that fits your requirements) to read the parameters and the
membership.csv. According to the configuration defined, the Lambda functions create, update, or delete groups, users, and asset access permissions.
When the necessary security configurations are complete, a Lambda function calls the QuickSight APIs to get the updated information and record the results in an S3 bucket as CSV files. The BI admin team can build datasets with these files and visualize the results with dashboards. For more information, see Using administrative dashboards for a centralized view of Amazon QuickSight objects and Building an administrative console in Amazon QuickSight to analyze usage metrics.
In addition, the errors of Lambda functions and the user deletion events are stored in this S3 bucket for the admin team to review.
The following diagram illustrates the overall workflow of the Lambda functions.
We use a programmable method to create and configure the groups and users automatically. For any ad hoc user registration request (such as the user isn’t recorded in
membership.csv yet due to latency), as long as the user can be authenticated, they can assume the AWS Identity and Access Management (IAM) role
quicksight-fed-user to self-provision as a QuickSight reader. This self-provisioned reader can only view a landing page dashboard, which provides the list of dashboards and corresponding groups. According to the dashboard-group mapping, this new reader can apply for membership of a given group to access the dashboards. If the group owner approves the application, the hourly Lambda functions add the new user into the group the next time they run.
The CI/CD pipeline starts from AWS CDK. The BI administrator and author can update the Systems Manager parameters to release new dashboards or other QuickSight assets in the AWS CDK stack
granular_access_stack.py. The BI administrator can update the Systems Manager parameters in the same stack to create, update, or delete namespaces, groups, or users. Then the DevOps team can deploy the updated AWS CDK stack to apply these changes to the Systems Manager parameters or other AWS resources. The Lambda functions are triggered hourly to call APIs to apply changes to the related QuickSight account.
The Lambda functions are restricted by the maximum runtime of 15 minutes. To overcome this limitation, we can convert the Lambda functions to AWS Glue Python shell scripts with the following high-level steps:
- Download Boto3 wheel files from pypi.org.
- Upload the wheel file into an S3 bucket.
- Download the Lambda functions and merge them into one Python script and create an AWS Glue Python shell script.
- Add the S3 path of the Boto3 wheel file into the Python library path. If you have multiple files to add, separate them with a comma.
- Schedule this AWS Glue job to run daily.
You must have the following prerequisites to implement this solution:
- A QuickSight Enterprise account
- Basic knowledge of Python
- Basic knowledge of SQL
- Basic knowledge of BI
Create the resources
Create your resources by downloading the AWS CDK stack from the GitHub repo.
granular_access folder, run the command
cdk deploy granular-access to deploy the resources. For more information, see AWS CDK Intro Workshop: Python Workshop.
Deploy the solution
When you deploy the AWS CDK stack, it creates five Lambda functions, as shown in the following screenshot.
The stack also creates additional supportive resources in your account.
granular_user_governance function is triggered by the Amazon CloudWatch event rule
qs-gc-everyhour. The information of groups and users is defined in the file
membership.csv. The S3 bucket name is stored in the parameter store
/qs/config/groups. The following diagram shows the flowchart of this function.
- Set the destination of
granular_user_governanceto another Lambda function,
The following diagram is a flowchart of this function.
To avoid breaking critical access to QuickSight assets governed by Admin or Author, we demote an admin or author by deleting the admin or author user and creating a new reader user with the Lambda function
granular_user_governance function handles downgrading admin to author, or upgrading author to admin.
- Set the destination of
downgrade_userto the Lambda function
The following diagram shows a flowchart of this function.
- Set the destination of
downgrade_userto the Lambda function
check_team_members function simply calls QuickSight APIs to get the namespaces, groups, users, and assets information, and saves the results in the S3 bucket. The S3 key is
Besides the two output files of the previous step, the error logs and user deletion logs (logs of
downgrade_user) are also saved in the
- Set the destination of
granular_access_assets_govenanceto the Lambda function
Create row-level security datasets
As a final step, we create RLS datasets. This allows you to change the dashboard records based on the users that view the dashboards.
QuickSight supports RLS by applying a system-managed dataset that sub-selects records from the dashboard dataset. The mechanism allows the administrator to provide a filtering dataset (the RLS dataset) with
groupname columns, which are automatically filtered to the user that is logged in. For example, a user named
YingWang belongs to QuickSight group
BI, so all the rows of the RLS dataset that correspond to the username
YingWang or group name
BI are filtered. The rows that remain in the RLS after applying the username and the group name filters are then used to filter the dashboard datasets further by matching columns with the same names. For more information about row-level security, see Using Row-Level Security (RLS) to Restrict Access to a Dataset.
In this solution, we export the sample user information into the file
membership.csv, which is stored in an S3 bucket. In this file, we provide some sample groups for RLS dataset definition. These groups are the data segment groups, as described in the overall architecture design. The following screenshot shows some of the groups and the users in those groups.
granular_user_governance function creates these groups and adds the related users to be members of these groups.
How do we create the RLS dataset? Let’s say we have a table called
employee_information in our organization’s HR database. The following screenshot shows some sample data.
Based on the
employee_information table, we create a view called
rls for an RLS dataset. See the following SQL code:
The following screenshot shows our sample data.
Now we have the table ready, we can create the RLS dataset with the following custom SQL:
The following screenshot shows our sample data.
For the group
quicksight-fed-all-countries, we set the
city as null, which means that all the users in this group can view the data of all countries.
For country level, only the security rules defined in the
groupname and country
columns are used for filtering. The
city columns are set as null. The users in the
quicksight-fed-usa group can view the data of USA, and the users in the
quicksight-fed-gbr group can view the data of GBR.
For each user with
groupname set as null, they can only view the specific country and city assigned to their username. For example,
TerryRigaud can only view data of Austin, in the US.
In QuickSight, multiple rules in an RLS dataset are combined together with OR.
With these multi-faceted RLS rules, we can define a comprehensive data access pattern.
To avoid incurring future charges, delete the resources you created by running the following command:
This post discussed how BI administrators can design and automate QuickSight authentication and authorization granular access control. We combined QuickSight security features like row-level and column-level security, groups, and namespaces to provide a comprehensive solution. Managing these changes through “BIOps” ensures a robust, scalable mechanism for managing QuickSight security. To learn more, sign up for a QuickSight demo.
About the Authors
Ying Wang is a Senior Data Visualization Engineer with the Data & Analytics Global Specialty Practice in AWS Professional Services.
Amir Bar Or is a Principal Data Architect at AWS Professional Services. After 20 years leading software organizations and developing data analytics platforms and products, he is now sharing his experience with large enterprise customers and helping them scale their data analytics in the cloud.