AWS Big Data Blog
Automate your validated dataset deployment using Amazon QuickSight and AWS CloudFormation
A lot of the power behind business intelligence (BI) and data visualization tools such as Amazon QuickSight comes from the ability to work interactively with data through a GUI. Report authors create dashboards using GUI-based tools, then in just a few clicks can share the dashboards with business users and decision-makers. This workflow empowers authors to create and manage the QuickSight resources and dashboards they’re responsible for providing.
Developer productivity is a great benefit of UI-based development, but enterprise customers often need to consider additional factors in their BI implementation:
- Promoting objects through environments (development, testing, production, and so on)
- Scaling for hundreds of authors and thousands of users
- Implementing data security, such as row-level and column-level rules to filter the data elements visible to specific users
- Regulatory requirements, processes, and compliance controls.
Approaches such as source control-backed CI/CD pipelines allow you to address compliance requirements and security gates with automation. For example, a hypothetical build pipeline for a Java Springboot application may enable developers to build and deploy freely to a dev environment, but the code must pass tests and vulnerability scans before being considered for promotion to upper environments. A human approval step then takes place before the code is released into production. Processes such as this provide quality, consistency, auditability, and accountability for the code being released.
The QuickSight API provides functionality for automation pipelines. Pipeline developers can use the API to migrate QuickSight resources from one environment to another. The API calls that facilitate handling QuickSight datasets enables inspection of the JSON representation of the dataset definition.
This post presents an example of how a QuickSight administrator can automate data resource management and security validation through use of the QuickSight API and AWS CloudFormation.
Solution overview
The model implements security rules that rely on naming conventions for tables and columns as an integral part of the security model. Instead of relying on naming conventions, you may want to use a lookup table or similar approach to store the relationships between data tables and security tables.
We guide you through the following steps:
- Create relational database tables to be secured.
- Create a QuickSight dataset in your dev account.
- Generate a CloudFormation template using a Python script that allows you to enforce row-level and column-level security in each environment. You can customize this script to the needs of your organization.
- Use the generated CloudFormation template to deploy through dev, test, and prod using your change management process.
You can use AWS CloudFormation to manage several types of QuickSight resources, but dataset resources are a critical junction for security, so they are our focus in this post.
To implement data security rules in a large organization, controls must be in place to agree upon and implement the rules from a process perspective. This post dives deep into using code to validate security aspects of your QuickSight deployment, but data security requires more than code. The approaches put forward are intended as a part of a larger change management process, much of which is based around human review and approval.
In addition to having a change management process in place, we suggest managing your AWS resources using a CI/CD pipeline. The nature of change management and CI/CD processes can vary greatly, and are outside the scope of this post.
Prerequisites
This post assumes a basic command of the following:
- Familiarity with the QuickSight UI (creating datasets, dashboards, and so on)
- Python programming (Python 3 preferred)
- Using the Boto3 library to interact with AWS
- Creating stacks with AWS CloudFormation
- SQL
- A Postgres database running in AWS
We don’t go into the broader picture of integrating into a full CI/CD process, so an understanding of CI/CD is helpful, but not required.
Security rules for your organization
Before we can write a script to confirm security rules have been applied correctly, we need to know what the security rules actually are. This means we need to determine the following:
- What – What is the data we are trying to secure? Which fields in the database are sensitive? Which field values will be used to filter access?
- Who – Who are the users and groups that should be provided access to the data and fields we have identified?
In concrete terms, we need to match identities (users and groups) to actual data values (used in row-level security) and sensitive fields (for column-level security). Identities such as users and groups typically correlate to entities in external systems such as Active Directory, but you can use native QuickSight users and groups.
For this post, we define the following rules that indicate the relationship between database objects (tables and fields) and how they should be secured. Keep in mind that these example rules may not apply to every organization. Security should be developed to match your requirements and processes.
- Any field name with
_sensitive
appended to it is identified as containing sensitive data. For example, a column namedsalary_usd_sensitive
should be restricted. For our scenario, we say that the user should be a member of the QuickSightrestricted
group in order to access sensitive fields. No other groups are allowed access to these fields. - For a given table, a companion table with
_rls
appended to the name contains the row-level security rules used to secure the table. In this model, the row-level security rules for theemployees
table are found in theemployees_rls
table. - Row-level security rules must be sourced 100% from the underlying data store. This means that you can’t upload rules via the QuickSight console, or use custom SQL in QuickSight to create the rules. Rules can be provided as views (if supported by the underlying data store) as long as the view definition is managed using a change management process.
- The dataset name should match the name of the underlying database table.
These rules rely on a well-structured change management process for the database. If users and developers have access to change database objects in production, the rules won’t carry much weight. For examples of automated schema management using open-source CI/CD tooling, refer to Deploy, track, and roll back RDS database code changes using open source tools Liquibase and Jenkins and How to Integrate Amazon RDS Schema Changes into CI/CD Pipelines with GitLab and Liquibase.
From the QuickSight perspective, our database becomes the source of the “what” and “who” we discussed earlier. QuickSight doesn’t own the security rules, it merely implements the rules as defined in the database.
Security rule management with database objects
For this post, we source data from a Postgres database using a read-only user created for QuickSight.
First, we create our schema and a data table with a few rows inserted:
Note the field discount_sensitive
. In our security model, any field name with _sensitive
appended to it is identified as containing sensitive data. This information is used later when we implement column-level security. In our example, we have the luxury of using naming conventions to tag the sensitive fields, but that isn’t always possible. Other options could involve the use of SQL comments, or creating a table that provides a lookup for sensitive fields. Which method you choose depends upon your data and requirements, and should be supported by a change management process.
Row-level security table
The following SQL creates a table containing the row-level security rules for the ledger.transactions
table, then inserts rules that match the example discussed earlier:
For more information about how to restrict access to a dataset using row-level security, refer to Using row-level security (RLS) with user-based rules to restrict access to a dataset
These rules match the specified QuickSight user groups to values in the department
field of the transactions
table.
Our last step in Postgres is to create a user that has read-only access to our tables. All end-user or SPICE refresh queries from QuickSight are run using this user. See the following code:
Create user groups
Our security model provides permissions based on group membership. Although QuickSight allows for these groups to be sourced from external systems such as Active Directory, our example uses native QuickSight groups.
We create our groups using the following AWS Command Line Interface (AWS CLI) commands. Take note of the restricted group we’re creating; this is the group we use to grant access to sensitive data columns.
You can also add a user to your group with the following code:
The Python script
Now that we have set up our database and groups, we switch focus to the Python script used for the following actions:
- Extracting the definition of a manually created dataset using the QuickSight API
- Ensuring that the dataset definition meets security standards
- Restructuring the dataset definition into the format of a CloudFormation template
- Writing the CloudFormation template to a JSON file
In the header of the script, you can see the following variables, which you should set to values in your own AWS environment:
QuickSight datasets have a name and an ID. The name is displayed in the QuickSight UI, and the ID is used to reference the dataset behind the scenes. The ID must be unique for a given account and Region, which is why QuickSight uses UUIDs by default, but you can use any unique string.
Create the datasets
You can use the QuickSight GUI or Public API to create a dataset for the transactions_rls
and transactions tables. For instructions, refer to Creating a dataset from a database. Connect to the database, create the datasets, then apply transactions_rls
as the row-level security for the transactions
dataset. You can use the following list-data-sets
AWS CLI call to verify that your tables were created successfully:
Script overview
Our script is based around the describe_data_set
method of the Boto3 QuickSight client. This method returns a Python dictionary containing all the attributes associated with a dataset resource. Our script analyzes these dictionaries, then coerces them into the structure required for dataset creation using AWS CloudFormation. The structure of the describe_data_set
method and the AWS::QuickSight::DataSet
CloudFormation resource are very similar, but not quite identical.
The following are the top-level fields in the response for the Boto3 QuickSight client describe_data_set
method:
Our script converts the response from the API to the structure required for creating a dataset using AWS CloudFormation.
The following are the top-level fields in the AWS::QuickSight::DataSet
CloudFormation resource:
The key differences between both JSON structures are as follows:
describe_data_set
containsArn
,CreatedTime
, andLastUpdatedTime
, which are useful fields but only relevant to an existing resource- AWS CloudFormation requires
AwsAccountId
when creating the resource - AWS CloudFormation accepts tags for the dataset, but
describe_data_set
doesn’t provide them - The AWS CloudFormation
Permissions
property allows for assigning AWS Identity and Access Management (IAM) permissions at the time of creation
Our script is able to selectively choose the top-level properties we want from the describe_data_set
response, then add the fields that AWS CloudFormation requires for resource creation.
Validate security
Before the script creates the CloudFormation template, it performs validations to ensure that our dataset conforms to the defined security rules.
The following is the snippet from our script that performs validation for row-level security:
The steps in the code are as follows:
- Ensure that any row-level security is applied (this is the bare minimum).
- Look up the dataset that contains the row-level security rules using another Boto3 call.
- Confirm that the row-level security dataset is not custom SQL.
- Confirm that the name of the table is as expected, with
_rls
appended to the name of the table being secured.
The use of custom SQL for sourcing row-level security rules isn’t secure in our case, because a QuickSight developer could use SQL to alter the underlying rules. Because of this, our model requires that a physical table from the dataset is used as the row-level security rule source. Of course, it’s possible to use a view in the database to provide the rules. A view is okay because the definition (in our scenario) is governed by a change management process, as opposed to the custom SQL, which the QuickSight developer can create.
The rules being implemented for your specific organization will be different. You may need to connect to a database directly from your Python script in order to validate the dataset was created in a secure manner. Regardless of your actual rules, the describe_data_set
API method provides you the details you need to begin validation of the dataset.
Column-level security
Our model for column-level security indicates that any database field name that ends in _sensitive should only be accessible to members of a QuickSight group named restricted. Instead of validating that the dataset has the column-level security rules applied correctly, we simply enforce the rules directly in two steps:
- Identify the sensitive fields.
- Create a dictionary and add it to our dataset with the key
ColumnLevelPermissionRules
.
To identify the sensitive fields, we create a list and iterate through the input columns of the physical table:
The result is a list of sensitive fields. We can then take this list and integrate it into the dataset through the use of a dictionary:
Instead of specifying a specific principal, we reference the CloudFormation template parameter RestrictedUserGroupArn
. The ARN for the restricted group is likely to vary, especially if you’re deploying to another AWS account. Using a template parameter allows us to specify the ARN at the time of dataset creation in the new environment.
Access to the dataset QuickSight resources
The Permissions
structure is added to the definition for each dataset:
A value for the QuickSightAdminPrincipal
CloudFormation template parameter is provided at the time of stack creation. The preceding structure provides the principal access to manage the QuickSight dataset resource itself. Note that this is not the same as data access (though an admin user could manually remove the row-level security rules). Row-level and column-level security rules indicate whether a given user has access to specific data, whereas these permissions allow for actions on the definition of the dataset, such as the following:
- Updating or deleting the dataset
- Changing the security permissions
- Initiating and monitoring SPICE refreshes
End-users don’t require this access in order to use a dashboard created from the dataset.
Run the script
Our script requires you to specify the dataset ID, which is not the same as the dataset name. To determine the ID, use the AWS CLI list-data-sets
command.
To set the script parameters, you can edit the following lines to match your environment:
The following snippet runs the Python script:
CloudFormation template
Now that the security rules have been validated, our script can generate the CloudFormation template. The describe_response_to_cf_data_set
method accepts a describe_data_set
response as input (along with a few other parameters) and returns a dictionary that reflects the structure of an AWS::QuickSight::DataSet
CloudFormation resource. Our code uses this method once for the primary dataset, and again for the _rls
rules. This method handles selecting values from the response, prunes some unnecessary items (such as empty tag lists), and replaces a few values with CloudFormation references. These references allow us to provide parameter values to the template, such as QuickSight principals and the data source ARN.
You can view the template using the cat
command:
You can deploy this template directly into AWS via the CloudFormation console. You are required to provide the following parameters:
- DataSourceArn – A QuickSight dataset is a reference to a table or other database object. In order for this object to be accessed, we need to specify a QuickSight data source resource that facilitates the connection.
- QuickSightAdminPrincipal – The IAM principal allowing access to the data source resource via AWS API calls. You can exclude the IAM permissions from this script and template if your existing security policies automatically provide access to the appropriate users and groups.
- RestrictedUserGroupArn – The ARN of the QuickSight group that is granted access to the sensitive columns.
You can also deploy the template using the AWS CLI. Although it’s possible to pass in all the parameters directly via the command line, you may find it a bit clunky when entering long values. To simplify this, our script generates a params.json
file structured to capture all the parameters required by the template:
Use the following command to build the stack, with params.json
as input:
You can use the AWS CloudFormation console to monitor the stack progress. When the creation is complete, you should see your new dataset in QuickSight!
Conclusion
Though the functionality is relatively new, I consider the API and AWS CloudFormation capabilities to be one of QuickSight’s biggest strengths. Automated validation and enforcement of security rules allows for scale and better security. Being able to manage dataset definitions using AWS CloudFormation provides repeatability, and all of this sets you up for automation. The API and AWS CloudFormation provide tooling to customize QuickSight to suit your workflow, bringing BI into your organization’s cloud management strategy.
If you are looking for related information about dashboard management and migration in QuickSight, refer to Migrate Amazon QuickSight across AWS accounts.
About the Author
Jeremy Winters is an Architect in the AWS Data Lab, where he helps customers design and build data applications to meet their business needs. Prior to AWS, Jeremy built cloud and data applications for consulting customers across a variety of industries.