Automate multi-account storage integrations in AWS using Snowflake and AWS Control Tower
This implementation guide describes how AWS Marketplace customers can integrate AWS Control Tower with Snowflake. The AWS Control Tower integration with Snowflake enables Snowflake storage integrations with Amazon S3 to be automatically available for all newly added AWS accounts in an AWS Control Tower environment.
Snowflake is a data warehouse built for the cloud. It provides enterprises with a cost-effective combination of the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud. Snowflake is a member of the AWS Partner Network (APN), and it’s available on the AWS Marketplace. Snowflake can help you bring together diverse data into one system for data warehousing, query processing, and analytics.
A Snowflake storage integration in AWS enables Snowflake to read and write data to Amazon Simple Storage Service (Amazon S3) in an AWS account.
The AWS CloudFormation templates and a detailed README for this solution are available on Snowflake Labs.
Solution overview: Automate multi-account storage integrations in AWS using Snowflake and AWS Control Tower
The solution is deployed using AWS CloudFormation templates, and it integrates with AWS Control Tower lifecycle events. When a new account is created, or an existing one is enrolled using the AWS Control Tower Account Factory, the lifecycle event triggers an AWS Lambda function. The Lambda function creates new CloudFormation stack instances in the newly added AWS Control Tower managed account.
The stack instance in the newly added AWS Control Tower managed account provisions a Systems Manager automation runbook in the account. Then, the runbook is launched by the account administrators of this AWS Control Tower managed account to create Snowflake integrations with S3 buckets in that account. Refer to the following diagram.
The AWS Systems Manager Automation runbook in the managed account automates all of the steps required by Snowflake to create a storage integration with Amazon S3 in that account. It provisions a Snowflake integration object, attaches an AWS Identity and Access Management (IAM) role to it, and creates a Snowflake stage object for it that references Amazon S3. The runbook uses AWS Secrets Manager to store and retrieve Snowflake connection information. Refer to the following diagram.
This solution provisions the following infrastructure in these AWS accounts:
- Shared services account
- Provisions the Snowflake Python Connector as an AWS Lambda layer.
- AWS Control Tower management account
- Provisions Amazon CloudWatch Events Rule that is triggered based on AWS Control Tower lifecycle event.
- Provisions a Lambda as a target for the CloudWatch Events Rule.
- AWS Control Tower managed account (newly added account)
- Provisions an AWS Systems Manager Automation runbook that creates Snowflake storage integrations with Amazon S3 in the newly added account.
- This runbook can be launched as many times as needed in the new account to create new integrations between Snowflake and additional S3 buckets in the account.
You must complete the following prerequisites before launching the solution:
Step 1: Subscribe to Snowflake
- Subscribe to Snowflake via AWS Marketplace. In the upper right, choose Continue to Subscribe. After successfully subscribing to Snowflake, choose Accept Terms. To begin using the software, you’ll be redirected to the Snowflake website. Select the Set Up Your Account button to sign up with Snowflake and complete your registration.
- Once your account is set up, navigate to your profile in the Snowflake console and note the Snowflake Account Identifier, Snowflake account username, and the Snowflake account password that you used to set up the account.
- If you’re new to Snowflake, and this is the first time that you’re using Snowflake, then once you’re set up with your Snowflake account complete this Snowflake in 20 minutes tutorial. At the end of this tutorial, you’ll learn how to create required Snowflake objects (warehouses, databases, and tables) for storing and querying data. You do this by loading a small amount of sample data from CSV files into a Snowflake table, and then querying the table. Note the Snowflake database and Snowflake schema from this tutorial. You must enter these values during your initial setup in Step 3.
- If you aren’t new to Snowflake, then once you’ve set up your Snowflake account, create a snowflake database with a Snowflake schema. Note the names of your Snowflake database and Snowflake schema, as you will need these in the installation (Step 3: Install solution) later.
Step 2: AWS setup – Shared services account
- Create an S3 bucket: s3-snowflakeintegration-accountId-region. Replace accountId and Region with the AWS Account ID and Region of your shared services AWS account.
- Create a folder in your S3 bucket called SnowflakeIntegration_Lambda_SSM and upload the SnowflakeIntegration_Lambda_SSM.zip file. This Lambda uses the Snowflake Connector for Python to query and update Snowflake.
- Upload the snowflakelayer.zip in the root folder of this S3 bucket. This zip file packages the Snowflake python connector as an AWS Lambda Layer.
- Provide organization-level read access to this S3 bucket (s3-snowflakeintegration-accountId-region).
- Download the s3bucketpolicy.json permissions policy file. Replace accountid and Region with the AWS Account ID and Region of your shared services AWS account.
- Sign in to your AWS Control Tower organization’s management account. Navigate to the AWS Organizations console and choose Settings. Note the organization ID of your Control Tower organization and replace the organizationid in the s3bucketpolicy.json file with this identifier.
- Follow the instructions outlined here to add a bucket policy using the Amazon S3 console and add the s3bucketpolicy.json as a bucket policy to your s3-snowflakeintegration-accountId-region bucket.
Step 3: Install solution
To install the solution, do the following:
- In the AWS Control Tower management account, launch the aws-snowflake-controltower.yaml template. Enter the following parameters:
- SourceBucket – This is the name of the S3 bucket that you created in Step 2: AWS setup – Shared services account.
- snowaccount – This is the Snowflake account identifier from Step 1: Subscribe to Snowflake.
- snowpass – This is the Snowflake account password from Step 1: Subscribe to Snowflake.
- snowdb – This is the Snowflake database that you created in Step 1: Subscribe to Snowflake.
- snowschema – This is the Snowflake schema from Step 1: Subscribe to Snowflake.
- Navigate to the CloudFormation console and make sure that the CloudFormation StackSet is successfully created.
- Navigate to the Amazon CloudWatch console and make sure that an Amazon CloudWatch Events rule is successfully created with a Lambda target to handle Control Tower lifecycle events.
Step 4: Test and run
Add a new managed account
To add a new managed account, from the AWS Control Tower management account:
- Use Account Factory to create a new managed account in the AWS Control Tower.
- This can take up to 30 minutes for the account to be successfully created and the AWS Control Tower Lifecycle Event to trigger.
Create a Snowflake storage integration with Amazon S3
To create a Snowflake storage integration with Amazon S3, from the AWS Control Tower managed account:
- Navigate to the AWS Systems Manager console in your AWS account. From the left panel, select Documents. On the right panel, select Owned by me and search for the Custom-Snowflakestorageintegration document. Follow the instructions here to launch your Systems Manager document from the console.
- Provide Snowflake connection details and an S3 bucket name as parameters.
- Navigate to the IAM console of your AWS account and check that a new IAM role has been provisioned that ends with the S3INTxxxxx suffix. This suffix will also be the name of your new Snowflake integration object.
- Log in to your Snowflake web interface and make sure that the URL contains your account identifier or use snowsql as outlined here.
- Validate that a new Snowflake integration object has been created (
DESC INTEGRATION 'integrationobjectname')
- Obtain the AWS_IAM_USER_ARN and AWS_EXTERNAL_ID parameters from running
DESC INTEGRATION 'integrationobjectname'. Navigate to the IAM console of your AWS account and check that the IAM role that was provisioned earlier (that ends with the S3INTxxxxx suffix) uses those parameters in its trust policy as the principal and external id parameters respectively.
- Validate that a new storage object has been created in Snowflake that references the S3 bucket and uses the integration object (
SHOW STAGES IN ACCOUNT)
- Validate that a new Snowflake integration object has been created (
In this post, I’ve built a solution and described how AWS Marketplace customers can integrate AWS Control Tower with Snowflake. The AWS Control Tower integration with Snowflake enables Snowflake storage integrations with Amazon S3 to be automatically available for all newly added AWS accounts in an AWS Control Tower environment. With my solution, S3 objects can be read as Snowflake tables and Snowflake can read and write data to Amazon S3 in the newly added AWS Control Tower account.
About the author
Kanishk Mahajan is a Principal, Solutions Architect at AWS. He leads cloud transformation and solution architecture for ISV partners and mutual customers. Kanishk specializes in management and governance, migrations and modernizations, and security and compliance. He is a Technical Field Community (TFC) member in each of those domains at AWS.