AWS Big Data Blog

Achieve fine-grained data security with row-level access control in Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With Amazon Redshift, you can analyze all your data to derive holistic insights about your business and your customers. One of the challenges with security is that enterprises want to provide fine-grained access control at the row level for sensitive data. You can do this by creating views or using different databases and schemas for different users. However, this approach isn’t scalable and becomes complex to maintain over time. Customers have asked us to simplify the process of securing their data by providing the ability to control granular access.

Row-level security (RLS) in Amazon Redshift is built on the foundation of role-based access control (RBAC). RLS allows you to control which users or roles can access specific records of data within tables, based on security policies that are defined at the database object level. This new RLS capability in Amazon Redshift enables you to dynamically filter existing rows of data in a table. This is in addition to column-level access control, where you can grant users permissions to a subset of columns. Now you can combine column-level access control with RLS policies to further restrict access to particular rows of visible columns.

In this post, we explore the row-level security features of Amazon Redshift and how you can use roles to simplify managing privileges required to your end-users.

Customer feedback

TrustLogix is a Norwest Venture Partners backed cloud security startup in the Data Security Governance space. TrustLogix delivers powerful monitoring, observability, audit, and fine-grained data entitlement capabilities that empower Amazon Redshift clients to implement data-centric security for their digital transformation initiatives.

“We’re excited about this new and deeper level of integration with Amazon Redshift. Our joint customers in security-forward and highly regulated sectors including financial services, healthcare, and pharmaceutical need to have incredibly fine-grained control over which users are allowed to access what data, and under which specific contexts. The new row-level security capabilities will allow our customers to precisely dictate data access controls based on their business entitlements while abstracting them away from the technical complexities. The new Amazon Redshift RLS capability will enable our joint customers to model policies at the business level, deploy and enforce them via a security-as-code model, ensuring secure and consistent access to their sensitive data.”

-Ganesh Kirti, founder and CEO of TrustLogix.

Overview of row-level security in Amazon Redshift

Row-level security allows you to restrict some records to certain users or roles, depending on the content of those records. With RLS, you can define policies to enforce fine-grained row-level access control. When creating RLS policies, you can specify expressions that control whether Amazon Redshift returns any existing rows in a table in a query. With RLS policies limiting access, you don’t have to add or externalize additional conditions in your queries. You can attach multiple policies to a table, and a single policy can be attached to multiple tables, making this implementation relationship many-to-many. Once attached, the RLS policy is applied on a relation and a set of users or roles, to run SELECT, UPDATE, and DELETE operations. All attached RLS policies have to evaluate together to true for a record to be returned by query. The RBAC built-in role, security admin, is responsible for managing the policies.

The following diagram illustrates the workflow.

With RLS, you can do the following:

  • Restrict row access based on roles – The security admin creates and defines if a role can access specific records of data within a table based on an RLS policy.
  • Combine multiple policies per user or role – Multiple policies can be defined per user or role, and all policies are applied with AND syntax.
  • Enhance granular access control – RLS is built on role-based access control and can work alongside column-level access control.
  • No access if no policy applied – All data access is blocked when there is no applicable policy on an RLS-protected table.
  • Enable row-level and column-level security on the table – In the following example, the user house is part of the role staff. When house queries the table, only one record pertaining to house is returned; the rest of the records are filtered as per the RLS policy. The sensitive column is also restricted, so users from the role staff can’t see this column. User cuddy is part of the role manager. When cuddy queries the employees table, all records and columns are returned.

Row-level security relevant use cases

With row-level security, many use cases for fine-grained access controls become possible. The following are just some of the many application use cases:

  • A global company with data analysts across different countries or regions can enforce restriction of data access to analysts based on geo location due to data compliance requirements.
  • A sales department can create a policy that allows them to restrict the access to sales performance information specific to a particular salesperson or region.
  • A payroll department can create an RLS policy to restrict access to look at an individual’s payroll, but managers need payroll information on their direct reports. Managers don’t need to know the details of payroll information for other departments.
  • A hospital can create an RLS policy that allows doctors and nurses to view data rows for their patients only.
  • A bank can create a policy to restrict access to financial data rows based on an employee’s business division or role in the company.
  • A multi-tenant application can create a policy to enforce a logical separation of each tenant’s data rows from every other tenant’s rows.

In the following example use cases, we illustrate enforcing an RLS policy on a fictitious healthcare setup. We demonstrate RLS on the medicine_data table and patients table, based on a policy established for managers, doctors, and departments. We also cover using a custom session variable context to set an RLS policy for the multi-tenant table customer.

To download the script and set up the tables, choose rls_createtable.sql.

Example 1: Read and write access

To grant read and write access, complete the following steps:

  1. Define four RLS policies using the secadmin role:
    1. all_can_see – No restrictions to be imposed
    2. hide_confidential – Restricts records for non-confidential rows
    3. only_doctors_can_see – Restricts records such that only doctors can see data
    4. see_only_own_department – Restricts records to only see data for own department
      CREATE RLS POLICY all_can_see
      USING ( true );
      
      CREATE RLS POLICY hide_confidential
      WITH ( confidential BOOLEAN )
      USING ( confidential = false )
      ;
      
      Note: Employee table is used as lookup in this policy
      
      CREATE RLS POLICY only_doctors_can_see
      USING (
          true = (
                  SELECT employee_is_doctor
                  FROM employees
                  WHERE employee_username = current_user
                  )
          )
      ;
      
      GRANT SELECT ON employees
      TO RLS POLICY only_doctors_can_see;
      
      CREATE RLS POLICY see_only_own_department
      WITH ( patient_dept_id INTEGER )
      USING (
          patient_dept_id IN (
                              SELECT department_id
                              FROM employees_departments
                              WHERE employee_username = current_user
                              )
          )
      ;
      
      GRANT SELECT ON employees_departments 
      TO RLS POLICY see_only_own_department;
  2. Create three roles for STAFF, MANAGER, and EXTERNAL:
    CREATE ROLE staff;
    CREATE ROLE manager;
    CREATE ROLE external;
  3. Now we define column-level access control for the roles and columns that are implementing the RLS policy:
    1. The MANAGER can access all columns in the Patients and Medicine_data tables, including the confidential column that defines RLS policies:
      --- manager can see full table patients and medicine data
      GRANT SELECT ON employees, employees_departments, patients, medicine_data TO ROLE manager, ROLE external;
    2. The STAFF role can access all columns except the confidential column:
      --- staff can see limited columns from medicine data
      GRANT SELECT (medicine_name, medicine_price) ON medicine_data 
      TO ROLE staff;
      
      --- staff can see limited columns from patients
      GRANT SELECT (patient_dept_id, patient_name, patient_birthday, patient_medicine, diagnosis) ON patients TO ROLE staff;
  4. Attach RLS policies to the roles we created:
    --- manager can see all medicine data
    ATTACH RLS POLICY all_can_see
    ON medicine_data
    TO ROLE manager;
    
    --- manager can see all patient data
    ATTACH RLS POLICY all_can_see
    ON patients
    TO ROLE manager;
    
    --- staff cannot see confidential medicine data
    ATTACH RLS POLICY hide_confidential
    ON medicine_data
    TO ROLE staff;
    
    --- staff cannot see confidential patient data
    ATTACH RLS POLICY hide_confidential
    ON patients
    TO ROLE staff;
    
    --- only doctors can see patient data
    ATTACH RLS POLICY only_doctors_can_see 
    ON patients
    TO PUBLIC;
    
    --- regular staff (doctors) can see data for patients in their department only
    ATTACH RLS POLICY see_only_own_department 
    ON patients
    TO ROLE staff;
  5. Enable RLS security on objects:
    ALTER TABLE medicine_data ROW LEVEL SECURITY on;
    ALTER TABLE patients ROW LEVEL SECURITY on;
  6. Create the users and grant them roles:
    CREATE USER house PASSWORD DISABLE;
    CREATE USER cuddy PASSWORD DISABLE;
    CREATE USER external PASSWORD DISABLE;
    
    GRANT ROLE staff TO house;
    GRANT ROLE manager TO cuddy;
    GRANT ROLE external TO external;

We can see RLS in action with a SELECT query:

--- As Cuddy, who is a doctor and a manager
SET SESSION AUTHORIZATION 'cuddy';

SELECT * FROM medicine_data;
--- policies applied: all_can_see

SELECT * FROM patients;
--- policies applied: all_can_see, only_doctors_can_see

As a super user and secadmin, you can query the svv_rls_applied_policy to audit and monitor the policies applied. We discuss system views for auditing and monitoring more later in this post.

--- As House, who is a doctor but not a manager - he is staff in department id 1

SET SESSION AUTHORIZATION 'house';

SELECT * FROM medicine_data;
--- column level access control applied 

SELECT current_user, medicine_name, medicine_price FROM medicine_data;
--- CLS + RLS policy = hide_confidential

SELECT * FROM patients;
--- column level access control applied

SELECT current_user, patient_dept_id, patient_name, patient_birthday, patient_medicine, diagnosis FROM patients;
--- CLS + RLS policies = hide_confidential, only_doctors_can_see, see_only_own_department

--- As External, who has no permission granted
SET SESSION AUTHORIZATION 'external';

SELECT * FROM medicine_data;
--- RLS policy applied: none - so no access

SELECT * FROM patients;
--- policies applied: none - so no access

With the UPDATE command, only the user house should be able to update patients records, as per the RLS for department 1:

SET SESSION AUTHORIZATION 'house';
UPDATE patients
SET diagnosis = 'house updated diagnosis';

select current_user, * from patients; --house should only be able to query department 1 non-confidential records

To test DELETE, as the user house, let’s delete records from patient table. Only two non-confidential records from patient_dept_id should be deleted as per the RLS policy:

SET SESSION AUTHORIZATION 'house';
delete  from patients;

Because both the records that house has access to are deleted from patients, selecting from the table will return no records.

When we switch to the user cuddy, who is manager and doctor, we have access to confidential records and can see three records:

SET SESSION AUTHORIZATION 'cuddy';
SELECT current_user, * from patients;

As a security admin, you can detach a policy from a table, user, or role. In this example, we detach the policy hide_confidential from the table patients from role staff:

DETACH RLS POLICY hide_confidential ON patients FROM ROLE staff;

When the user house queries the patients table, they should now have access to confidential records:

SET SESSION AUTHORIZATION 'house';

SELECT * from patients;

Using the security admin role, you can drop the policy hide_confidential:

DROP RLS POLICY IF EXISTS hide_confidential;

Because the hide_confidential RLS policy is still attached to the medicine_data table, you get the dependency error.

To remove this policy from all the tables, users, and roles, you can use the cascade option:

DROP RLS POLICY IF EXISTS hide_confidential cascade;

When user house queries the medicine_data table, no records are returned, because the medicine_data table has RLS on and no RLS policy is attached to the role staff for this table.

SET SESSION AUTHORIZATION 'house';
SELECT * from MEDICINE_DATA;

Let’s turn off row-level security on the table medicine_data using the security admin role:

ALTER TABLE MEDICINE_DATA ROW LEVEL SECURITY OFF;
SET SESSION AUTHORIZATION 'house';

SELECT * FROM MEDICINE_DATA;

Example 2: Session context variables

Some of the applications require you to use connection pooling, and you can use application-based user authentication instead of using separate database users for each user. The session context variables feature in Amazon Redshift enables you to pass the application user ID to the database for applying role-base security.

Amazon Redshift now allows you to set a customized session context variable using set_config. Using the session context variable allows you to provide such granular access using RLS.

In this example, we illustrate the use case when you have the common table customer, where you’re getting data from several customers. The table has a column with c_customer_id to distinguish data for respective customers.

  1. Create the external user and grant the external role:
    CREATE USER external_user PASSWORD 'Testemp1';
    grant role EXTERNAL to external_user;
  2. Grant SELECT on the customer table to role external:
    grant usage on schema report to role EXTERNAL;
    GRANT select ON TABLE report.customer TO ROLE EXTERNAL;
  3. Turn on row-level security for the report.customer table:
    ALTER TABLE report.customer row level security on;
  4. Create a row-level security policy using the session context variable app.customer_id to enforce the policy to filter records for c_customer_id:
    CREATE RLS POLICY see_only_own_customer_rows
    WITH ( c_customer_id char(16) )
    USING ( c_customer_id = current_setting('app.customer_id', FALSE));
    ATTACH RLS POLICY see_only_own_customer_rows ON report.customer TO ROLE EXTERNAL;

Now we can observe RLS in action. When you query the customer table with session context set to customer ID AAAAAAAAJNGEGCBA, the row-level policy was enforced only to return one customer row that matched the session variable value:

SET SESSION AUTHORIZATION 'external_user';

select set_config('app.customer_id', 'AAAAAAAAJNGEGCBA', FALSE);
select * from report.customer limit 10;

Auditing and monitoring RLS policies

Amazon Redshift has added several new system views to be able to monitor the row-level policies. The following table lists the system views, users, and roles that have access, and the function of the views.

System View Users Function
SVV_RLS_POLICY sys:secadmin View a list of all row-level security policies created
SVV_RLS_RELATION sys:secadmin View a list of all relations and users that have one or more row-level security policies attached on the currently connected database
SVV_RLS_APPLIED_POLICY sys:secadmin List RLS-protected relations
SVV_RLS_ATTACHED_POLICY Superuser, sys:operator, or any user with the system permission ACCESS SYSTEM TABLE Trace the application of RLS policies on queries that reference RLS-protected relations

Conclusion

In this post, we demonstrated how you can simplify the management of row-level security for fine-grained access control of your sensitive data building on the foundation of role-based access control. For more information about RLS best practices, refer to Amazon Redshift security overview. Try out RLS for your future Amazon Redshift implementations, and feel free to leave a comment about your use cases and experience.

Amazon Redshift Spectrum supports row-level, column-level, and cell-level access control for data stored in Amazon Simple Storage Service (Amazon S3) and managed by AWS Lake Formation. In a future post, we will show how you can implement row-level security for Redshift Spectrum tables using Lake Formation.


About the authors

Harshida Patel is a Specialist Sr. Solutions Architect, Analytics, with AWS.

Milind Oke is a Senior Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over two decades and specializes in Amazon Redshift.

Abhilash Nagilla is a Specialist Solutions Architect, Analytics, with AWS.

Yanzhu Ji is a Product Manager on the Amazon Redshift team. She worked on the Amazon Redshift team as a Software Engineer before becoming a Product Manager. She has rich experience of how the customer-facing Amazon Redshift features are built from planning to launching, and always treats customers’ requirements as first priority. In her personal life, Yanzhu likes painting, photography, and playing tennis.

Kiran Chinta is a Software Development Manager at Amazon Redshift. He leads a strong team in query processing, SQL language, data security, and performance. Kiran is passionate about delivering products that seamlessly integrate with customers’ business applications with the right ease of use and performance. In his spare time, he enjoys reading and playing tennis.

Debu-PandaDebu Panda is a Senior Manager, Product Management, with AWS. He is an industry leader in analytics, application platforms, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases, and has presented at multiple conferences such as AWS re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).