AWS Big Data Blog
Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 1
This post is co-authored by Rajkumar Irudayaraj, Sr. Director of Product, Salesforce Data Cloud.
In today’s ever-evolving business landscape, organizations must harness and act on data to fuel analytics, generate insights, and make informed decisions to deliver exceptional customer experiences. Salesforce and Amazon have collaborated to help customers unlock value from unified data and accelerate time to insights with bidirectional Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift.
In a previous post, we showed how Zero Copy data federation empowers businesses to access Amazon Redshift data within the Salesforce Data Cloud to enrich customer 360 data with operational data. This two-part series explores how analytics teams can access customer 360 data from Salesforce Data Cloud within Amazon Redshift to generate insights on unified data without the overhead of extract, transform, and load (ETL) pipelines. In this post, we cover data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. Part 2 covers cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.
What is Salesforce Data Cloud?
Salesforce Data Cloud is a data platform that unifies all of your company’s data into Salesforce’s Einstein 1 Platform, giving every team a 360-degree view of the customer to drive automation, create analytics, personalize engagement, and power trusted artificial intelligence (AI). Salesforce Data Cloud creates a holistic customer view by turning volumes of disconnected data into a unified customer profile that’s straightforward to access and understand. This unified view helps your sales, service, and marketing teams build personalized customer experiences, invoke data-driven actions and workflows, and safely drive AI across all Salesforce applications.
What is Amazon Redshift?
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence (BI) tools. It’s optimized for datasets ranging from a few hundred gigabytes to petabytes and delivers better price-performance compared to other data warehousing solutions. With a fully managed, AI-powered, massively parallel processing (MPP) architecture, Amazon Redshift makes business decision-making quick and cost-effective. Amazon Redshift Spectrum enables querying structured and semi-structured data in Amazon Simple Storage Service (Amazon S3) without having to load the data into Redshift tables. Redshift Spectrum integration with AWS Lake Formation enables querying auto-mounted AWS Glue Data Catalog tables with AWS Identity and Access Management (IAM) credentials and harnessing Lake Formation for permission grants and access control policies on Data Catalog views. Salesforce Data Cloud Data sharing with Amazon Redshift leverages AWS Glue Data Catalog support for multi-engine views and Redshift Spectrum integration with Lake Formation.
What is Zero Copy data sharing?
Zero Copy data sharing enables Amazon Redshift customers to query customer 360 data stored in Salesforce Data Cloud without the need for traditional ETL to move or copy the data. Instead, you simply connect and use the data in place, unlocking its value immediately with on demand access to the most recent data. Data sharing is supported with both Amazon Redshift Serverless and provisioned RA3 clusters. Data can be shared with a Redshift Serverless or provisioned cluster in the same Region or with a Redshift Serverless cluster in a different Region. To get an overview of Salesforce Zero Copy integration with Amazon Redshift, please refer to this Salesforce Blog.
Solution overview
Salesforce Data Cloud provides a point-and-click experience to share data with a customer’s AWS account. On the Lake Formation console, you can accept the data share, create the resource link, mount Salesforce Data Cloud objects as data catalog views, and grant permissions to query the live and unified data in Amazon Redshift.
The following diagram depicts the end-to-end process involved for sharing Salesforce Data Cloud data with Amazon Redshift in the same Region using a Zero Copy architecture. This architecture follows the pattern documented in Cross-account data sharing best practices and considerations.
The data share setup consists of the following high-level steps:
- The Salesforce Data Cloud admin creates the data share target with the target account for the data share.
- The Salesforce Data Cloud admin selects the data cloud objects to be shared with Amazon Redshift and creates a data share.
- The Salesforce Data Cloud admin links the data share to the data share target, which invokes the following operations to create a cross-account resource share:
- Create a Data Catalog view for the Salesforce Data Cloud Apache Iceberg tables by invoking the Catalog API.
- Use Lake Formation sharing to create a cross-account Data Catalog share.
- In the customer AWS account, the Lake Formation admin logs in to the Lake Formation console to accept the resource share, create a resource link, and grant access permissions to the Redshift role.
- The data analyst launches the Amazon Redshift Query Editor with the appropriate role to query the data share and join with native Redshift tables.
Prerequisites
The following are the prerequisites to enable data sharing:
- A Salesforce Data Cloud account.
- An AWS account with AWS Glue and Lake Formation enabled.
- Either a Redshift Serverless or a Redshift provisioned cluster with RA3 instance types (ra3.16xlarge, ra3.4xlarge, ra3.xlplus). Data sharing is not supported for other provisioned instance types like DC2 or DS2 and must be set up before accessing the data share. If you don’t have an existing provisioned Redshift RA3 cluster, we recommend using a Redshift Serverless namespace for ease of operations and maintenance.
- The Amazon Redshift service must be running in the same Region where the Salesforce Data Cloud is running.
- AWS admin roles for Lake Formation and Amazon Redshift:
- Lake Formation – A data lake admin for accepting the share and providing access to users. For more details, see Lake Formation personas and IAM permissions reference.
- Amazon Redshift – A Redshift database owner, admin, or superuser who creates the database and provides access to developers or analysts. For more details, see Default database user permissions.
Create the data share target
Complete the following steps to create the data share target:
- In Salesforce Data Cloud, choose App Launcher and choose Data Share Targets.
- Choose New and choose Amazon Redshift, then choose Next.
- Enter the details for Label, API Name, and Account for the data share target.
- Choose Save.
After you save these settings, the S3 Tenant Folder value is populated.
- Choose the S3 Tenant Folder link and copy the verification token.
If you’re not signed in to the AWS Management Console, you’ll be redirected to the login page.
- Enter the verification token and choose Save.
The data share target turns to active status.
Create a data share
Complete the following steps to create a data share:
- Navigate to the Data Share tab in your Salesforce org.
- Choose App Launcher and choose Data Shares.
Alternatively, you can navigate to the Data Share tab from your org’s home page.
- Choose New, then choose Next.
- Provide a label, name, data space, and description, then choose Next.
- Select the objects to be included in the share and choose Save.
Link the data share target to the data share
To link the data share target to the data share, complete the following steps:
- On the data share record home page, choose Link/Unlink Data Share Target.
- Select the data share target you want to link to the data share and choose Save.
The data share must be active before you can accept the resource share on the Lake Formation console.
Accept the data share in Lake Formation
This section provides the detailed steps for accepting the data share invite and configuration steps to mount the data share with Amazon Redshift.
- After the data share is successfully linked to the data share target, navigate to the Lake Formation console.
The data share invitation banner is displayed.
- Choose Accept and create.
The Accept and create page shows a resource link and provides the option to set up IAM permissions.
- In the Principals section, choose the IAM users and roles to grant the default permissions (describe and select) for the data share resource link.
- Choose Create.
The resource link created in the previous step appears next to the AWS Glue database resource share on the Lake Formation console.
Query the data share from Redshift Serverless
Launch the query editor for Redshift Serverless and log in as a federated user with the role that has describe and select permissions for the resource link.
The data share tables are auto-mounted, appear under awsdatacatalog
, and can be queried as shown in the following screenshot.
Query the data share from the Redshift provisioned cluster
To query the data share from the Redshift provisioned cluster, log in to the provisioned cluster as the superuser.
On an editor tab, run the following SQL statement to grant an IAM user access to the Data Catalog:
IAM:myIAMUser is an IAM user that you want to grant usage privilege to the Data Catalog. Alternatively, you can grant usage privilege to IAMR:myIAMRole for an IAM role. For more details, refer to Querying the AWS Glue Data Catalog.
Log in as the user with the role from the previous step using temporary credentials.
You should be able to expand awsdatacatalog
and query the data share tables as shown in the following screenshot.
Conclusion
Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift represents a significant advancement in how organizations can use their customer 360 data. By eliminating the need for data movement, this approach offers real-time insights, reduced costs, and enhanced security. As businesses continue to prioritize data-driven decision-making, Zero Copy data sharing will play a crucial role in unlocking the full potential of customer data across platforms.
This integration empowers organizations to break down data silos, accelerate analytics, and drive more agile customer-centric strategies. To learn more, refer to the following resources:
- AWS Glue Data Catalog supports multi engine views with AWS Analytics Engines
- Data sharing in AWS Lake Formation
- Salesforce Zero Copy Integration
- Apache Iceberg
- Transform Your Data Strategy with the Power of Salesforce Data Cloud’s Zero Copy Integration to Amazon Redshift
About the Authors
Rajkumar Irudayaraj is a Senior Product Director at Salesforce with over 20 years of experience in data platforms and services, with a passion for delivering data-powered experiences to customers.
Jason Berkowitz is a Senior Product Manager with AWS Lake Formation. He comes from a background in machine learning and data lake architectures. He helps customers become data-driven.
Ravi Bhattiprolu is a Senior Partner Solutions Architect at AWS. Ravi works with strategic ISV partners, Salesforce and Tableau, to deliver innovative and well-architected products & solutions that help joint customers achieve their business and technical objectives.
Avijit Goswami is a Principal Solutions Architect at AWS specialized in data and analytics. He supports AWS strategic customers in building high-performing, secure, and scalable data lake solutions on AWS using AWS managed services and open source solutions. Outside of his work, Avijit likes to travel, hike, watch sports, and listen to music.
Ife Stewart is a Principal Solutions Architect in the Strategic ISV segment at AWS. She has been engaged with Salesforce Data Cloud over the last 2 years to help build integrated customer experiences across Salesforce and AWS. Ife has over 10 years of experience in technology. She is an advocate for diversity and inclusion in the technology field.
Michael Chess is a Technical Product Manager at AWS Lake Formation. He focuses on improving data permissions across the data lake. He is passionate about ensuring customers can build and optimize their data lakes to meet stringent security requirements.
Mike Patterson is a Senior Customer Solutions Manager in the Strategic ISV segment at AWS. He has partnered with Salesforce Data Cloud to align business objectives with innovative AWS solutions to achieve impactful customer experiences. In his spare time, he enjoys spending time with his family, sports, and outdoor activities.