AWS for Industries

Deploying a Privacy-Safe Data Clean Room on AWS with Snowflake

The deprecation of device identifiers and third-party cookies has accelerated the need for advertisers, data providers, and media companies to collaborate directly as they seek to understand customers, plan media campaigns, and analyze path-to-purchase. At the same time, privacy regulations, consumer trust, and intellectual property protection are driving the need for increased data security and governance when data is shared between parties. These two industry trends – increased data security and multi-party data collaboration – are driving customers to invest specialized workloads called data clean rooms. Clean rooms act as collaborative workspaces where multiple parties can run analytics together with restrictions that ensure data security and governance for underlying raw data.

Today at the Snowflake Media Data Cloud Summit, Snowflake, the Data Cloud company, announced the availability of a new solution that helps marketers, media companies, agencies, and technology companies accelerate deployments of Snowflake data clean rooms for audience enrichment, segmentation, marketing measurement, and attribution with Amazon Web Services (AWS). The solution, Deploying a Privacy-Safe Data Clean Room in the Cloud, makes it easy to connect first-party data in Amazon Simple Storage Service (Amazon S3) with Snowflake clean rooms for privacy-safe data collaboration and helps customers move fast to unlock data clean room use cases with AWS and Snowflake.

Customers can sign up for a Snowflake account through AWS Marketplace, and deploy the solution using a preconfigured AWS CloudFormation template that registers their data in a Snowflake data clean room while maintaining control of the underlying data in their AWS account. AWS customers already using Snowflake for data collaboration workloads can use this new solution to decrease the time it takes to add new data collaboration partners and datasets to their Snowflake-powered clean rooms. Two or more customer types can participate in Snowflake data clean room collaborations and maintain control of data in their own AWS account. For example, marketers, agencies, media platforms, and a broad array of advertising and marketing technology customers can collaborate together for use cases such as audience enrichment, segmentation, media planning, activation, and measurement in privacy-safe workspaces.

The remainder of this blog post describes how you can join privacy-safe data clean room with Snowflake through AWS Marketplace and AWS CloudFormation.

How to collaborate with a Snowflake data clean room using AWS

1. Subscribe to Snowflake via AWS Marketplace

Log into the AWS Management Console and go to the AWS Marketplace. Click on “Discover products” and type “Snowflake” in the search box.

Select “Snowflake Data Cloud,” click on the “Continue to Subscribe” button, and then click on the “Subscribe” button on the next screen.

This will open a Subscription window for Snowflake. Enter your information and select the Snowflake Edition and AWS Region you want your Snowflake account to be deployed in. We recommend that you use the Business Critical Edition and the AWS Region that will be shared with the media publisher.

Once you enter the information, you will receive an email to complete your Snowflake subscription.

2. Export data to be used for the Data Clean Room to Amazon S3

The Data Clean Room allows advertisers to combine data with media company data for analytics. Amazon S3 can be used as the data store for simplicity.

Create an Amazon S3 bucket. We recommend using the same AWS Region as the Snowflake account.

Export the data to the Amazon S3 bucket using a prefix and folder structure as needed. You can use AWS Glue or any preferred data integration tool. We recommend a self-describing format such as parquet for the data files.

Example:

3. Snowflake account setup

After signing in to the new Snowflake account, we recommend that you follow a few steps to configure the Snowflake account to use a specific user and role for the data clean room.

The following SQL script can be used to configure a user, role, database, and Snowflake warehouse to be used for the clean room. Simply copy and paste it into the Snowflake console and run the SQL using the “Run” button.

Note: Enter your own password for the user in the script first.

Code block:

use role accountadmin;

create or replace role cleanroom_rl;

grant role cleanroom_rl to role accountadmin;

grant create integration on account to role cleanroom_rl;

CREATE USER cleanroom_admin PASSWORD = '<your password>'

            LOGIN_NAME = 'cleanroom_admin'

            DISPLAY_NAME = 'cleanroom_admin'

            DEFAULT_ROLE = "cleanroom_rl"

            MUST_CHANGE_PASSWORD = FALSE;

GRANT ROLE cleanroom_rl TO USER cleanroom_admin;

create or replace database cleanroom_db;

grant usage on database cleanroom_db to role cleanroom_rl;

create schema shared;

grant ownership on schema cleanroom_db.shared to role cleanroom_rl;

CREATE OR REPLACE WAREHOUSE cleanroom_wh

   WITH WAREHOUSE_SIZE = 'XSMALL' 

        AUTO_SUSPEND = 600

        AUTO_RESUME = TRUE

        MIN_CLUSTER_COUNT = 1

        MAX_CLUSTER_COUNT = 2

        SCALING_POLICY = 'STANDARD';

grant  MODIFY, MONITOR,USAGE, OPERATE  on warehouse cleanroom_wh to role cleanroom_rl;

4. AWS and Snowflake configuration using CloudFormation

You can now configure your AWS and Snowflake integration using a CloudFormation template to set up the storage integration between your Amazon S3 bucket and Snowflake.

While logged into your AWS account, click on the following URL to load the CloudFormation template:

https://console.aws.amazon.com/cloudformation/home?region=region#/stacks/new?stackName=snowflake-onboarding&templateURL=https://snowflake-corp-se-workshop.s3.us-west-1.amazonaws.com/cleanroom/cleanroom.yaml

This will open a new tab in your browser with CloudFormation.

Select the AWS Region you want to use. We recommend using the same AWS Region as the Amazon S3 bucket and the Snowflake account.

On the next screen, select the “Next” button to start the CloudFormation template.

Provide parameters for the CloudFormation template:

  • Stack Name can stay the default: snowflake-onboarding
  • assetsS3BucketName is a bucket that will be used for drivers, add a descriptor after the default string: i.e., snowflake-clean-room-assets-abc123
  • dataS3BucketName is the name of the bucket that was created for the data: i.e., datacleanroom-aws-demo
  • snowflakeAccount is the Snowflake account identifier, for more information see this documentation: i.e., sla04924
  • snowflakePassword is the password used for the Snowflake user created previously
  • snowflakeRole is the role that was created in the previous step: cleanroom_rl
  • snowflakeStorageIntegrationName can be left as default: SNOWCLEANROOM (use only uppercase letters, numbers, and underscores if you do change it)
  • snowflakeUserName is the user name that was configured: cleanroom_admin

Click the “Next” button and follow the steps to complete the CloudFormation template.

Once it is complete, Snowflake will have access to the data in the S3 bucket to be used with the data clean room.

5. Configure Snowflake External Table

You can now configure a Snowflake external table to read the data for the clean room from your S3 data bucket. We will use Snowflake schema discovery to read the parquet data schema and create the external table.

In the Snowflake console, switch your role to the role we created for the clean room.

use role cleanroom_rl;

We will also use the database and schema that was previously created.

use cleanroom_db.shared;

Now we can set the file format to parquet.

create or replace file format cleanroom_format

  type = parquet;

We will now create a Snowflake stage using the storage integration that was configured by the CloudFormation template. Use your S3 bucket name and path to configure the stage.

Note: You can use the storage integration to create multiple stages; this can be useful if you have other datasets in a different folder on Amazon S3.

Example:

create or replace stage cleanroom_data storage_integration = SNOWCLEANROOM_STORAGE_INTEGRATION

    url = 's3://datacleanroom-aws-demo/cleanroom/customersales'

    file_format = (format_name = 'cleanroom_format');

You can explore the files in the Amazon S3 location by using the list command to verify that the configuration is correct.

list @cleanroom_data/;

We can now perform schema discovery on the parquet data using the following command:

 select generate_column_description(array_agg(object_construct(*)), 'external_table') as columns

        from table (

            infer_schema(

                    location=>'@cleanroom_data/',

                    file_format =>'cleanroom_format'

            )

          ); 

Click on the output the console. The output of the command will look something like this:

"OLD_UID" TEXT AS ($1:OLD_UID::TEXT),

"PRODUCT" TEXT AS ($1:PRODUCT::TEXT),

"SIGNUP_WEBSITE" TEXT AS ($1:SIGNUP_WEBSITE::TEXT),

"EMAIL" TEXT AS ($1:EMAIL::TEXT),

"STATE" TEXT AS ($1:STATE::TEXT),

"LIFESTYLE_LIKE" TEXT AS ($1:LIFESTYLE_LIKE::TEXT),

"AGE_RANGE" TEXT AS ($1:AGE_RANGE::TEXT),

"PREFERRED_PAYMENT_METHOD" TEXT AS ($1:PREFERRED_PAYMENT_METHOD::TEXT),

"EQUIPMENT_LIKE" TEXT AS ($1:EQUIPMENT_LIKE::TEXT),

"COUNTRY" TEXT AS ($1:COUNTRY::TEXT),

"USER_ID" TEXT AS ($1:USER_ID::TEXT),

"PURCHASE_DATE" DATE AS ($1:PURCHASE_DATE::DATE),

"SIGNUP_DATE" DATE AS ($1:SIGNUP_DATE::DATE),

"CHANNEL" TEXT AS ($1:CHANNEL::TEXT),

"MARITAL" TEXT AS ($1:MARITAL::TEXT),

"GENDER" TEXT AS ($1:GENDER::TEXT),

"BRAND_TYPE" TEXT AS ($1:BRAND_TYPE::TEXT),

"PURCHASE_AMT" NUMBER(3, 0) AS ($1:PURCHASE_AMT::NUMBER(3, 0)),

"REC_ID" NUMBER(18, 0) AS ($1:REC_ID::NUMBER(18, 0)),

"ROW_ID" TEXT AS ($1:ROW_ID::TEXT),

"PRODUCT_CATEGORY" TEXT AS ($1:PRODUCT_CATEGORY::TEXT),

"ACTIVITY_LIKE" TEXT AS ($1:ACTIVITY_LIKE::TEXT)

Copy this output to the clipboard.

You can now create an external table by pasting the output txt in this SQL command. Use your own naming convention for the external table name. In this example, we used customersales_ext.

 create or replace external table customersales_ext

    (<copy text>)

    with location =  @cleanroom_data/

    file_format = (format_name = 'cleanroom_format');

Example:

create or replace external table customersales_ext

    ("AGE_RANGE" TEXT AS ($1:AGE_RANGE::TEXT),

"PREFERRED_PAYMENT_METHOD" TEXT AS ($1:PREFERRED_PAYMENT_METHOD::TEXT),

"EQUIPMENT_LIKE" TEXT AS ($1:EQUIPMENT_LIKE::TEXT),

"PURCHASE_AMT" NUMBER(3, 0) AS ($1:PURCHASE_AMT::NUMBER(3, 0)),

"REC_ID" NUMBER(18, 0) AS ($1:REC_ID::NUMBER(18, 0)),

"ROW_ID" TEXT AS ($1:ROW_ID::TEXT),

"PRODUCT_CATEGORY" TEXT AS ($1:PRODUCT_CATEGORY::TEXT),

"ACTIVITY_LIKE" TEXT AS ($1:ACTIVITY_LIKE::TEXT),

"SIGNUP_DATE" DATE AS ($1:SIGNUP_DATE::DATE),

"CHANNEL" TEXT AS ($1:CHANNEL::TEXT),

"MARITAL" TEXT AS ($1:MARITAL::TEXT),

"GENDER" TEXT AS ($1:GENDER::TEXT),

"BRAND_TYPE" TEXT AS ($1:BRAND_TYPE::TEXT),

"USER_ID" TEXT AS ($1:USER_ID::TEXT),

"PURCHASE_DATE" DATE AS ($1:PURCHASE_DATE::DATE),

"COUNTRY" TEXT AS ($1:COUNTRY::TEXT),

"OLD_UID" TEXT AS ($1:OLD_UID::TEXT),

"PRODUCT" TEXT AS ($1:PRODUCT::TEXT),

"SIGNUP_WEBSITE" TEXT AS ($1:SIGNUP_WEBSITE::TEXT),

"EMAIL" TEXT AS ($1:EMAIL::TEXT),

"STATE" TEXT AS ($1:STATE::TEXT),

"LIFESTYLE_LIKE" TEXT AS ($1:LIFESTYLE_LIKE::TEXT))

    with location =  @cleanroom_data/

    file_format = (format_name = 'cleanroom_format');

6. Contact Media Publisher for Data Clean Room

Your AWS and Snowflake accounts are now ready to be used with data clean rooms. Work with the media publisher for the specific requirements of their Snowflake data clean rooms.

Summary

With the preceding steps, marketers, media companies, agencies, and technology companies can use AWS to collaborate together using a privacy-safe Snowflake data clean room. The solution helps multiple parties perform shared analysis for audience enrichment, segmentation, marketing measurement, and attribution with restrictions that ensure data security and governance for underlying raw data. You can also learn more about Snowflake data clean rooms on their website and in this whitepaper. Get started with this solution by subscribing to Snowflake via AWS Marketplace. Check out the CloudFormation template for this solution in the AWS Management Console.

Adam Solomon

Adam Solomon

Adam Solomon is Global Head of Business Development for AWS Clean Rooms & AWS Entity Resolution at Amazon Web Services (AWS). Adam drives development and adoption of cloud-based solutions to help marketers, agencies, media platforms, and advertising/marketing technology companies address industry-wide challenges for consumer privacy, identity, and secure multi-party analytics. Adam has 25 years of experience in the advertising and marketing industry, and is co-inventor on 8 issued U.S. patents for advertising and marketing technologies. His diverse experience includes leadership roles for product management and go-to-market teams at Paramount, Time Inc., Hearst, PebblePost, and Lotame. He is also a trained aerospace engineer and patent attorney.

Andries Engelbrecht

Andries Engelbrecht

Andries Engelbrecht is a Partner Solution Architect at Snowflake Inc.

Bosco Albuquerque

Bosco Albuquerque

Bosco Albuquerque is a Sr Partner Solutions Architect at AWS and has over 20 years of experience in working with database and analytics products from enterprise database vendors and cloud providers, and has helped technology companies in designing and implementing data analytics solutions and data products.