Automate Snowflake integration with Amazon S3 using AWS Service Catalog
Snowflake is a data warehouse built for the cloud. It provides enterprises with a cost-effective combination of the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud. Snowflake is a member of the AWS Partner Network (APN), and is available in AWS Marketplace. Snowflake can help you bring together diverse data into one system for data warehousing, query processing, and analytics.
AWS Service Catalog enables organizations to create and manage catalogs of IT services. It enables users to quickly deploy only the approved IT services they need on AWS with consistent governance enforced centrally by IT. Amazon S3 provides object storage that is cost-effective, secure, retrieves any amount of data, and is designed for 99.999999999 percent (eleven 9s) of data durability.
In this blog post, Andries and I show how to provide full automation for Snowflake to access S3 using AWS Service Catalog and AWS CloudFormation. In the solution, the launch of an AWS Service Catalog product enables S3 objects to be read as Snowflake tables. It also enables Snowflake to read and write data to S3. Furthermore, the AWS Service Catalog product can be launched as many times as needed to create new integrations between Snowflake and additional S3 buckets (with prefixes).
You must complete the following prerequisites before implementing the Snowflake and S3 automation using AWS Service Catalog:
- To subscribe to Snowflake via AWS Marketplace, follow this link to Snowflake’s listing in AWS Marketplace. In the upper right, choose Continue to Subscribe. After successfully subscribing to Snowflake, choose Accept Terms. To begin using the software, you will be redirected to the Snowflake website. To sign up with Snowflake and complete your registration, select the Set Up Your Account button.
- If you are new to Snowflake, complete this Snowflake in 20 minutes tutorial. At the end of this tutorial, you learn how to create required Snowflake objects, including warehouses, databases, and tables for storing and querying data. You also learn how to load a small amount of sample data from CSV files into a Snowflake table and then query the table.
- Create an S3 bucket: s3-snowflakeintegration-accountId-region. Replace accountId and Region with the AWS account ID and Region of your AWS account.
- Create a folder in your S3 bucket called SnowflakeIntegration_Lambda and upload the SnowflakeIntegration_Lambda.zip file. This Lambda function uses the Snowflake Connector for Python to query and update Snowflake.
- Upload the snowflakelayer.zip in the root folder of this S3 bucket. This zip file packages the Snowflake Python Connector as an AWS Lambda Layer.
- Create a folder in your S3 bucket called template and upload the aws-snowflake-integrationobject.yml CloudFormation template. This template is provisioned when the AWS Service Catalog product is launched, and it automates this integration for Snowflake to access S3.
Snowflake storage integrations are Snowflake objects that enable Snowflake to read and write data to Amazon S3. Snowflake storage integrations uses AWS Identity and Access Management (IAM) to access S3. The S3 bucket is referenced by the Snowflake integration from an external (in this case, S3) Snowflake stage object.
The solution implements an AWS Service Catalog product that automates Snowflake access to S3. The AWS Service Catalog product provisions the Snowflake integration object, attaches an IAM role to it, and creates a Snowflake stage object for it that references S3. The AWS Service Catalog product uses AWS Secrets Manager to store and retrieve Snowflake connection information.
Each time the AWS Service Catalog product is launched, it creates a new Snowflake external stage object to access an S3 object or prefix based on two input parameters. Those two parameters are Snowflake connection information and the S3 bucket with a prefix. The solution is fully automated using AWS CloudFormation.
Refer to the following architecture diagram, which illustrates the interactions of the previously-mentioned components of the AWS Service Catalog, Snowflake, and S3 integration:
How it works
Here’s how this solution works:
- The AWS Service Catalog administrator installs the solution by launching the aws-snowflakeintobj-servicecatalog CloudFormation template. This template provisions an AWS Service Catalog portfolio with an AWS Service Catalog product.
- The AWS Service Catalog end user or administrator launches the Snowflake AWS Service Catalog product. This product takes a) Snowflake Connection information and b) S3 bucket name and prefix as input parameters. It also uses the aws-snowflake-integrationobject.yml CloudFormation template to create a Snowflake external stage object that enables access to S3.
- The Snowflake AWS Service Catalog product can be launched as many times as needed. Each time, it creates a new Snowflake external stage object to access a new S3 object or prefix based on the input parameters (a and b) supplied previously.
- The AWS Service Catalog product from Step 2:
- Provisions Secrets Manager to store and retrieve Snowflake connection information.
- Provisions a Lambda function that uses the Snowflake Python connector:
- Creates a Snowflake integration object and obtains the Snowflake generated AWS_IAM_USER_ARN and AWS_EXTERNAL_ID from the Snowflake integration.
- Provisions an IAM role that uses the Snowflake-generated AWS_IAM_USER_ARN and AWS_EXTERNAL_ID in the trust policy of the IAM role. Creates a Snowflake stage object that leverages the Snowflake integration.
How to install
You install this solution in a single step. To install, create a stack from the AWS CloudFormation console by launching the aws-snowflakeintobj-servicecatalog template. The template takes the following parameter:
S3StagingBucketURL—This is the HTTP URL of the S3 bucket (s3-snowflakeintegration-accountId-region) that you configured in Prerequisites Step 3.
Test and run
To test and run the solution, do the following:
- The Snowflake solution creates a Snowflake AWS Service Catalog portfolio and a SnowflakeEnduserGroup IAM group and provides this IAM group with access to the portfolio. In order to launch the Snowflake AWS Service Catalog product, you have two options.
- Option 1—Follow the steps here to grant your current logged-in IAM user/role permissions to access the AWS Service Catalog portfolio. Launch the Snowflake AWS Service Catalog product using your current logged-in IAM user or role.
- Option 2—Add an IAM user to the SnowflakeEnduserGroup IAM group. To launch the Snowflake AWS Service Catalog product, log in as this IAM user.
- Navigate to the AWS Service Catalog console in your AWS account. Follow instructions here to launch your AWS Service Catalog product SnowflakeStorageIntegrationProduct.
- Provide Snowflake connection details and the external S3 bucket name and S3 prefix as parameters.
- Navigate to the AWS Identity and Access Management console of your AWS account. Check that a new IAM role has been provisioned that ends with S3INTxxxxx suffix. This suffix will also be the name of your new Snowflake integration object.
- Log in to your Snowflake web interface and ensure that the URL contains your account identifier or use snowsql as outlined here.
- Validate that a new Snowflake integration object has been created using
DESC INTEGRATION 'integrationobjectname'.
- Obtain the AWS_IAM_USER_ARN and AWS_EXTERNAL_ID parameters from Step 3.1 in the How it works section. Navigate to the IAM console of your AWS account and check that the IAM role created in Step 3 here uses those parameters in its trust policy as the principal and external ID parameters respectively.
- Validate that a new storage object has been created in Snowflake that references the S3 bucket and uses the integration object by using
SHOW STAGES IN ACCOUNT.
- Validate that a new Snowflake integration object has been created using
To clean up your account after deploying the solution outlined in this blog post, perform the following steps:
- Terminate the Snowflake AWS Service Catalog provisioned product. In the AWS Service Catalog console of the AWS managed account, log in as the end user. Navigate to the Provisioned product list page. Select the provisioned product that was deployed, SnowflakeStorageIntegration Product, and then choose Terminate provisioned product. Verify the provisioned product that you want to delete, and then choose Terminate.
- If you followed Test and run Step 1 Option 1, remove the access of your logged-in AWS user from the Snowflake AWS Service Catalog Portfolio. If you followed Test and run Step 1 Option 2, remove the IAM user from the SnowflakeEnduserGroup IAM group.
- Delete the CloudFormation stack for the aws-snowflakeintobj-servicecatalog template.
In this post, we described a solution that provides full automation for Snowflake to access S3 using AWS Service Catalog. The solution enables the launch of an AWS Service Catalog product to enable S3 objects to be read as Snowflake tables. It also enables Snowflake to read and write data to S3. Furthermore, the Snowflake AWS Service Catalog product can be launched as many times as needed to create new integrations between Snowflake and additional S3 buckets with prefixes.
About the authors
Kanishk Mahajan is an Independent Software Vendor (ISV) Solutions Architecture Lead at AWS. In this role, he leads cloud transformation and solution architecture for ISV partners and mutual customers. Kanishk specializes in management and governance, migrations and modernizations, and security and compliance. He is a Technical Field Community (TFC) member in each of those domains at AWS.
Andries Engelbrecht is a Principal Partner Solution Architect at Snowflake working with strategic partners. He is actively engaged with strategic partners like AWS building product and service integrations to support joint solutions. Andries has over 20 years of experience in the field of data and analytics.
Bosco Albuquerque is a Sr Partner Solutions Architect at AWS. He has over 20 years of experience in working with database and analytics products from enterprise database vendors and cloud providers. He has helped technology companies in designing and implementing data analytics solutions and data products.