Achieve finer-grained data security with column-level access control in Amazon Redshift
Amazon Redshift is the most popular cloud data warehouse because it provides fast insights at a low cost. Customers can confidently run mission critical workloads, even in highly regulated industries, because Amazon Redshift comes with out of the box security and compliance. The security features, combined with the ability to easily analyze data in-place and in open formats, along with compute and storage elasticity, and ease of use are what makes tens of thousands of customers choose Amazon Redshift.
Many organizations store sensitive data, commonly classified as personally identifiable information (PII) or sensitive personal information (SPI) in Amazon Redshift and this data will have restricted access from different persona in the organization. For example, your human resources, finance, sales, data science, and marketing departments may all have the required access privileges to view customer data, whereas only the finance department should have access to sensitive data like personally identifiable information (PII) or payment card industry (PCI).
Views or AWS Lake Formation on Amazon Redshift Spectrum was used previously to manage such scenarios, however this adds extra overhead in creating and maintaining views or Amazon Redshift Spectrum. View based approach is also difficult to scale and can lead to lack of security controls. Amazon Redshift column-level access control is a new feature that supports access control at a column-level for data in Amazon Redshift. You can use column-level GRANT and REVOKE statements to help meet your security and compliance needs similar to managing any database object.
This post shows you how to setup Amazon Redshift column-level access control on table, view and materialized view.
There are two tables that store customer demographic and account balance data. Finance department can see all customer data while Sales department can only view and update market segment and account balance data as the rest of customer demographic data like customer name, phone and nation are considered PII data and should have restricted access. This is a good use case for column-level access control to secure the PII data. Below is a simple entity relation diagram for the 2 tables.
Before trying out the illustration in this blog, note the following prerequisites:
Setting up the environment
To setup the environment and implement the use case, complete the following steps:
- Connect to your Amazon Redshift cluster using any SQL client of your choice with user with permission to create table or superuser.
- Create two tables with the following code:
- Populate some sample data into the two tables with the following code:
- Create a view and a materialized view with the following code:
- The purpose of the view,
customer_vwis to implement business rule of customer income group categorization based on
- Analytical dashboards frequently access this dataset by joining and aggregating tables
nationand thus, the materialized view
customernation_mvis created to speed up the performance such query significantly.
- Create and grant table level permissions to user
financewhich represent finance department users. Note that below users are created only for illustration purpose. We recommend you to use AWS IAM Federation to bring your corporate users without creating them manually in Amazon Redshift. For more information, please refer to https://docs.aws.amazon.com/redshift/latest/mgmt/redshift-iam-authentication-access-control.html#authentication.
- Note that user
UPDATEpermission on all columns on
- You need to test and validate user
financeis able to view all data from the
customernation_mvmaterialized view and update data on
- Enter the following code:
Note that SQL statement
SET SESSION AUTHORIZATION 'finance'is used to impersonate user
financein above code.
Each select statement should return five rows and the update statement should return one row updated. See the following code:
You have now successfully setup table level permissions for user
finance to view and update all customer data.
Setting up Amazon Redshift column-level access control
Column-level access control can be enabled and disabled by using
REVOKE statements with the following syntax:
To set up column-level privileges, complete the following steps:
- To determine which users have column-level access control, you can query PG_ATTRIBUTE_INFO system view. Enter the following code:
- The query should return zero records as we have not implemented column-level access control yet.
- Grant user
sales SELECTpermission on columns
customer_vwand materialized view
customernation_mv. We also grant
UPDATEpermission on column
customerby entering the following code:
- Error message “
Grant options are not supported for column privileges” should be returned for the first statement. This is because only a table’s owner or a superuser can grant column-level privileges and to maintain simple security model.
- Validate if above permissions have been granted with the following code:
- The query should return six rows. See the following code:
The output above shows:
attaclcolumn value “rw”)
attaclcolumn value “r”)
- After you confirmed the column-level access control are correct, run as user
salesto query table
customer_vwand materialized view
customernation_mvusing the following code:
- Each select statement should return permission denied error as the user does not have permissions to all columns of the objects being queried. See the following code:
- Query only the columns
view customer_vwand materialized view
customernation_mvwith the following code:
- Each select statement should return five rows as user
saleshas permission to query columns
accountbalance. See the following code:
- Update the
accountbalancecolumn with the following code:
- The select statement should return one row that shows value 2000. See the following code:
- Update the
accountbalancecolumn with condition
nationid=1by using the following code:
- The update statement should return permission denied error as user sales does not have column-level privileges on column nationid in the where clause.
- Query the count of record group by nationid with the following code:
- The select statement should return permission denied error as user
salesdoesn’t have column-level privileges on column
nationidin the group by clause.
- Please also note that column-level privileges are checked for columns not only in the select list but also
havingclause and other clauses of a query that require
SELECT/UPDATEprivileges on a column.
- Remove column
marketsegmentfrom column-level access control for user sales using
REVOKEcommand and see what happens. Enter the following code:
- As you can see, user
salesis no longer able to view
marketsegmentfrom table customer.
- Enter the following code to query column
- The statement should return five rows as user
salesstill has access to column
marketsegmenton the view even though column-level privileges have been revoked from table
customer. Views execute with the permissions of the view owner so it will still continue to work as long as the view’s owner still has column or table-level privileges on the base tables used by the view. To prevent unauthorized access of the sensitive data, the column-level privileges for user sales should be revoked from the view as well.
- Revoke all permissions for user
saleswith the following code:
- Query the table, view and materialized view again with user
salesusing the following code:
- Permission denied error should be returned and this shows that
REVOKEis able to remove all permissions.
As summary, a simple
GRANT statement will enable column-level access control on Amazon Redshift table, view and materialized view. A
REVOKE statement is what you need to remove the permission. This eliminates the complexity of legacy views-based access control to achieve fine-grained read and write access control.
Once you are done with above testing, you can remove the objects and users with the following code:
Amazon Redshift is secure by default and security doesn’t cost extra. It provides Authentication (Active Directory, Okta, Ping Federate, and Azure AD), Federation and comes pre-integrated with AWS IAM and KMS. It also supports table-based access control for data in Amazon Redshift and column-level access control for data in Amazon S3 through Amazon Redshift Spectrum since September 2019. Amazon Redshift now supports access control at a column-level for local tables, eliminating the need to implement view-based access control or using another system.
This post showed you how easy it is to setup Amazon Redshift column-level access control. The use case in this post demonstrated how to confirm that you have fine-grained access on the table, view, and materialized view. You can adopt this feature to support your business needs.
If you have any questions or suggestions, please leave a comment below.
About the Authors
BP Yau is a Data Warehouse Specialist Solutions Architect at AWS. His role is to help customers architect big data solutions to process data at scale. Before AWS, he helped Amazon.com Supply Chain Optimization Technologies migrate the Oracle Data Warehouse to Amazon Redshift and built the next generation big data analytics platform using AWS technologies.
Srikanth Sopirala is a Sr. Specialist Solutions Architect focused on Analytics at AWS. He is passionate about helping customers build scalable data and analytics solutions in the cloud.