Fully Managed Data Access Governance in Amazon Aurora Using Privacera

By Don Bosco Durai, Co-Founder and CTO – Privacera
By Lovelesh Chawla, Director of Solutions Engineering – Privacera
By Jason Payne, Sr. Sales Engineer – Privacera
By Ayan Ray, Sr. Partner Solutions Architect – AWS

Privacera

A big challenge when working to break down data silos is making the right data securely available to the right user across the organization. There are often many data users spanning the organization, and they each want access to the operational as well as analytical data stores.

It’s essential to ensure the right governance policies are in place and the right people, applications, and consumers can access the right data. Otherwise, you’ll be putting security at risk.

Organizations need to be able to carefully define, monitor, and manage who has access to specific pieces of data according to their policies and external regulations. A unified governance policy that grows and evolves with the technology, people, and business is a necessity.

Data governance is the combination of people, process, and technology used to manage the availability, usability, integrity, and security of enterprise system data. Effective data governance ensures data is consistent and trustworthy without being misused.

Data governance includes a broad set of capabilities, and the right solution is often dependent upon customer requirements in addition to the AWS and non-AWS services an organization already has in place.

In this post, we will discuss how Privacera enables data access governance on Amazon Aurora, including tag-based access control policies.

Privacera is an AWS Data and Analytics Competency Partner and AWS Marketplace Seller that’s a leading provider of unified data access governance solutions. It enables customers to deliver responsible data-powered performance from their ever-expanding data landscape.

Background

PrivaceraCloud provides a unified and holistic way to manage, define, and enforce policies across storage, compute engines, and consumption methods. It’s built on the core attribute-based access control (ABAC) policy model of Apache Ranger, and applies that model to data lakes, relational databases, streaming systems, and more.

Privacera integrates with the AWS Glue metastore and Amazon EMR processing services like Hive, Spark, and Trino, as well as Amazon Aurora, Amazon Athena, Amazon Redshift, and other AWS services.

Amazon Aurora is a relational database management system (RDBMS) built for the cloud with MySQL and PostgreSQL compatibility. Aurora gives you the performance and availability of commercial-grade databases at one-tenth the cost.

Aurora features a distributed, fault-tolerant, and self-healing storage system that’s decoupled from compute resources and auto-scales up to 128 TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon Simple Storage Service (Amazon S3), and replication across three AWS Availability Zones (AZs).

To meet your connectivity and workload requirements, Amazon Aurora horizontal Auto Scaling dynamically adjusts the number of Amazon Aurora Replicas provisioned for an Amazon Aurora cluster using single-master replication. This enables your Aurora cluster to handle sudden increases in connectivity or workload. When the connectivity or workload decreases, Aurora Auto Scaling removes unnecessary Aurora Replicas so that you do not pay for unused provisioned DB instances.

To top it off, you do not have IOPS limitations in Aurora. The throughput of the underlying instance class in combination with the amount of workload you push determines the amount of IOPS you can realize on a provisioned Aurora cluster.

Access Management

For Amazon Aurora, Privacera relies on the PolicySync model which translates Privacera Ranger fine-grained access policy directives into the native permission model of Aurora and keeps the target in sync with current policies.

A PolicySync connector is an application Privacera runs in a Kubernetes container that pulls policies from Apache Ranger’s policy store. Policies are translated into native grants, revokes, or Data Definition Languages (DDLs) and executed on the local data source. It monitors the environment and ensures Privacera is the source of truth for data access policies.

The connector continually collects the current set of policies from Privacera as well as the current set of resources being managed and current users, groups, and roles. There’s no query performance impact as enforcement is done at the data source.

Figure 1 – Amazon Aurora integration with PrivaceraCloud architecture.

Automated Sensitive Data Discovery

Privacera Discovery helps reveal information about your data and usage. It crawls computing assets such as databases or files, which are called data sources, and scans the data sources to identify sensitive information like credit card numbers, social security numbers, and other personal, restricted, or confidential information.

Privacera Discovery classifies or labels this information to create a comprehensive catalog of your sensitive data. You can review these classifications to accept or reject them, or refine the scanning via rules, dictionaries, models, and patterns. Once this data has been classified and tagged, the tags can be used for access policies, data masking, and encryption.

In addition to using Privacera’s Discovery module for data classification, there is API support and customers can utilize API calls to apply tags to their data manually or programmatically. Privacera can be integrated with third-party catalog providers to synchronize existing classifications into Privacera.

Prerequisites

Before getting started, you must complete the following prerequisites:

Set up a PrivaceraCloud user account: Follow the documentation to set up a PrivaceraCloud user account.
Set up Amazon Aurora cluster: Follow the documentation to set up an Amazon Aurora cluster. Once the cluster has been configured, create a database called sales with a schema called sales and a table called sales_data using the following columns: name, email, ssn, us_phone, address, account_id, zipcode, country. Populate it with sample data.
Allow access for PrivaceraCloud: Follow the documentation to set up the prerequisites to allow PrivaceraCloud access to the Aurora cluster and to configure logging.

Create Amazon Aurora Connection

The first step is to establish connectivity between the Amazon Aurora PostgresSQL cluster and Privacera.

Log in to the Privacera console and navigate to Applications listed under Settings. Locate the PostgreSQL tile in the list of Available Connections and create a new PostgresSQL connection.

Toggle the Access Management button and provide configuration details of the Aurora cluster. Enable policy enforcements and user/group/role management, which allows Privacera to manage users, groups, role, and access control for the Aurora target application.

Enable access audits to turn on audit logging. Set the audit source to SQS and enter the AWS access key, secret key, region, and SQS queue name.

Figure 2 – Sample Amazon Aurora application configuration.

Configure Amazon Aurora for Discovery Scans

Log in to the Privacera console and navigate to Applications listed under Settings. Select the PostgreSQL tile and then select the Postgres connection you created earlier. Toggle the Data Discovery switch to on.

Per the instructions, enter a JDBC URL, JDBC username, and JDBC password. Click on Advanced and toggle the Enable Ranger TagSync switch to on and select privacera_postgres as the Service Name. Finally, click Save.

Figure 3 – Sample Amazon Aurora discovery application configuration.

Create Groups and Users

After you have registered the Amazon Aurora connection, the next step is to create user groups and create users under each group. Follow the instructions to create two new user groups for Accounting and Sales.

Next, create a user named Emily under the Accounting group, a user named Nick under the Sales group, and a user named Ed without any group. Ed is a data administrator here.

Follow these steps to assign a custom attribute country to the user Emily, and assign the value of the attribute as UK. Similarly, add country as US for the user Nick.

Create Resource-Based Policies for Admin and Analysts

Refer to the “Create a Resource-Based Policy to Allow Access” section of this AWS blog post to create a resource-based policy to assign Ed full access to the sales schema present in the sales_data database. Also, create an allow a condition to give select permission to both sales and accounting groups.

Note there will be minor differences in the set of permissions as Privacera leverages the underlying data service’s native capabilities.

Create Policy to Mask Sensitive Columns

Privacera can mask certain columns to protect sensitive information. In this example, the sales table has personally identifiable information (PII) about users including name, email, and phone. You can mask this information from the sales group. Follow the steps mentioned in the “Create Policy to Mask Sensitive Columns” section of this AWS blog post to create a policy to mask sensitive columns

The secure views will be updated to incorporate the logic from the masking expression and the concerned principals.

Create Policy to Filter Rows Based on User Attributes

Next, follow the steps mentioned in the “Create Policy to Filter Rows Based on User Attributes” section of this AWS blog post to create a policy to filter certain rows based on the country attribute we assigned to each user.

Run a Scan to Discover Sensitive Data

Click on Discovery and then Data Sources. You’ll see the source you configured in the step titled “Configure Amazon Aurora for Discovery Scans.” Click Add, and in the Database Name field type in sales_schema and for table name type in sales_data. Click Save and then Scan Resource.

Figure 4 – Resource configuration for Discovery Scan.

After the scan is run, click on Discovery and then Scan Status to see the status of your scan. If it’s successful, you will see a link in the Scan Id column. Click on this link to see the column that was tagged.

Figure 5 – Sample output of a Discovery Scan (above), and column tagged in Discovery Scan (below).

Create Masking Policy Based on Tagged Column

Tag policies can be created to control access or mask sensitive data. In contrast to resource-based policies, where policies are created per data service, one tag policy can span across your entire data estate. For example, if you have data that is tagged with SSN in Amazon Aurora and Amazon Redshift, then one policy can be created to mask data in both places.

In our scan, the name column was tagged with PERSON_NAME. To create a policy to mask data with this tag, navigate to Access Management and then to Tag Policies. If a service definition is not listed, click on the three dots to the right of TAG, and select Add Service. If the privacera_tag already exists, you can skip this step.

Figure 6 – Service creation.

Next, click on the privacera_tag and choose the Masking tab. Select Add Policy, give the policy a descriptive name in the Policy Name field, and type PERSON_NAME in the TAG field.

Select the Sales group, choose Postgres under Component Permissions, and choose Nullify as the masking option. Finally, click Save.

Figure 7 – Masking policy creation.

Conclusion

In this post, we showed how to use Privacera to manage access control policies in Amazon Aurora. You also learned how Privacera can be used to mask columns or filter by row.

Sensitive data access can be managed throughout your organization by utilizing tags. This integration enables organizations to make data-driven decisions using Amazon Aurora with Privacera to enhance data access governance.

Privacera is an AWS Data and Analytics Competency Partner and is available in AWS Marketplace.

.

.

Privacera – AWS Partner Spotlight

Privacera is an AWS Data and Analytics Partner that provides security and privacy tools for enterprises to secure and govern user access to databases and datastores in the cloud.

Contact Privacera | Partner Overview | AWS Marketplace