AWS Big Data Blog
Cross-account integration between SaaS platforms using Amazon AppFlow
Implementing an effective data sharing strategy that satisfies compliance and regulatory requirements is complex. Customers often need to share data between disparate software as a service (SaaS) platforms within their organization or across organizations. On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform.
Let’s take an example. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. The marketing team created leads based on the event in Adobe Marketo. An automated process downloaded the leads from Marketo in the marketing AWS account. These leads are then pushed to the sales AWS account. A business process picks up those leads, filters them based on a “Do Not Call” criteria, and creates entries in the Salesforce system. Now, the sales team can pursue those leads and continue to track the opportunities in Salesforce.
In this post, we show how to share your data across SaaS platforms in a cross-account structure using fully managed, low-code AWS services such as Amazon AppFlow, Amazon EventBridge, AWS Step Functions, and AWS Glue.
Solution overview
Considering our example of AnyCompany, let’s look at the data flow. AnyCompany’s Marketo instance is integrated with the producer AWS account. As the leads from Marketo land in the producer AWS account, they’re pushed to the consumer AWS account, which is integrated to Salesforce. Business logic is applied to the leads data in the consumer AWS account, and then the curated data is loaded into Salesforce.
We have used a serverless architecture to implement this use case. The following AWS services are used for data ingestion, processing, and load:
- Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift, in just a few clicks. With AppFlow, you can run data flows at nearly any scale at the frequency you choose—on a schedule, in response to a business event, or on demand. You can configure data transformation capabilities like filtering and validation to generate rich, ready-to-use data as part of the flow itself, without additional steps. Amazon AppFlow is used to download leads data from Marketo and upload the curated leads data into Salesforce.
- Amazon EventBridge is a serverless event bus that lets you receive, filter, transform, route, and deliver events. EventBridge is used to track the events like receiving the leads data in the producer or consumer AWS accounts and then triggering a workflow.
- AWS Step Functions is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines. Step Functions is used to orchestrate the data processing.
- AWS Glue is a serverless data preparation service that makes it easy to run extract, transform, and load (ETL) jobs. An AWS Glue job encapsulates a script that reads, processes, and then writes data to a new schema. This solution uses Python 3.6 AWS Glue jobs for data filtration and processing.
- Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Amazon S3 is used to store the leads data.
Let’s review the architecture in detail. The following diagram shows a visual representation of how this integration works.
The following steps outline the process for transferring and processing leads data using Amazon AppFlow, Amazon S3, EventBridge, Step Functions, AWS Glue, and Salesforce:
- Amazon AppFlow runs on a daily schedule and retrieves any new leads created within the last 24 hours (incremental changes) from Marketo.
- The leads are saved as Parquet format files in an S3 bucket in the producer account.
- When the daily flow is complete, Amazon AppFlow emits events to EventBridge.
- EventBridge triggers Step Functions.
- Step Functions copies the Parquet format files containing the leads from the producer account’s S3 bucket to the consumer account’s S3 bucket.
- Upon a successful file transfer, Step Functions publishes an event in the consumer account’s EventBridge.
- An EventBridge rule intercepts this event and triggers Step Functions in the consumer account.
- Step Functions calls an AWS Glue crawler, which scans the leads Parquet files and creates a table in the AWS Glue Data Catalog.
- The AWS Glue job is called, which selects records with the Do Not Call field set to false from the leads files, and creates a new set of curated Parquet files. We have used an AWS Glue job for the ETL pipeline to showcase how you can use purpose-built analytics service for complex ETL needs. However, for simple filtering requirements like Do Not Call, you can use the existing filtering feature of Amazon AppFlow.
- Step Functions then calls Amazon AppFlow.
- Finally, Amazon AppFlow populates the Salesforce leads based on the data in the curated Parquet files.
We have provided artifacts in this post to deploy the AWS services in your account and try out the solution.
Prerequisites
To follow the deployment walkthrough, you need two AWS accounts, one for the producer and other for the consumer. Use us-east-1
or us-west-2
as your AWS Region.
Consumer account setup:
Stage the data
To prepare the data, complete the following steps:
- Download the zipped archive file to use for this solution and unzip the files locally.
The AWS Glue job uses the glue-job.py
script to perform ETL and populates the curated table in the Data Catalog.
- Create an S3 bucket called
consumer-configbucket-<ACCOUNT_ID>
via the Amazon S3 console in the consumer account, whereACCOUNT_ID
is your AWS account ID. - Upload the script to this location.
Create a connection to Salesforce
Follow the connection setup steps outlined in here. Please make a note of the Salesforce connector name.
Create a connection to Salesforce in the consumer account
Follow the connection setup steps outlined in Create Opportunity Object Flow.
Set up resources with AWS CloudFormation
We provided two AWS CloudFormation templates to create resources: one for the producer account, and one for the consumer account.
Amazon S3 now applies server-side encryption with Amazon S3 managed keys (SSE-S3) as the base level of encryption for every bucket in Amazon S3. Starting January 5, 2023, all new object uploads to Amazon S3 are automatically encrypted at no additional cost and with no impact on performance. We use this default encryption for both producer and consumer S3 buckets. If you choose to bring your own keys with AWS Key Management Service (AWS KMS), we recommend referring to Replicating objects created with server-side encryption (SSE-C, SSE-S3, SSE-KMS) for cross-account replication.
Launch the CloudFormation stack in the consumer account
Let’s start with creating resources in the consumer account. There are a few dependencies on the consumer account resources from the producer account. To launch the CloudFormation stack in the consumer account, complete the following steps:
- Sign in to the consumer account’s AWS CloudFormation console in the target Region.
- Choose Launch Stack.
- Choose Next.
- For Stack name, enter a stack name, such as
stack-appflow-consumer
. - Enter the parameters for the connector name, object, and producer (source) account ID.
- Choose Next.
- On the next page, choose Next.
- Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
- Choose Create stack.
Stack creation takes approximately 5 minutes to complete. It will create the following resources. You can find them on the Outputs tab of the CloudFormation stack.
- ConsumerS3Bucket –
consumer-databucket-<consumer account id>
- Consumer S3 Target Folder –
marketo-leads-source
- ConsumerEventBusArn –
arn:aws:events:<region>:<consumer account id>:event-bus/consumer-custom-event-bus
- ConsumerEventRuleArn –
arn:aws:events:<region>:<consumer account id>:rule/consumer-custom-event-bus/consumer-custom-event-bus-rule
- ConsumerStepFunction –
arn:aws:states:<region>:<consumer account id>:stateMachine:consumer-state-machine
- ConsumerGlueCrawler –
consumer-glue-crawler
- ConsumerGlueJob –
consumer-glue-job
- ConsumerGlueDatabase –
consumer-glue-database
- ConsumerAppFlow –
arn:aws:appflow:<region>:<consumer account id>:flow/consumer-appflow
Producer account setup:
Create a connection to Marketo
Follow the connection setup steps outlined in here. Please make a note of the Marketo connector name.
Launch the CloudFormation stack in the producer account
Now let’s create resources in the producer account. Complete the following steps:
- Sign in to the producer account’s AWS CloudFormation console in the source Region.
- Choose Launch Stack.
- Choose Next.
- For Stack name, enter a stack name, such as
stack-appflow-producer
. - Enter the following parameters and leave the rest as default:
- AppFlowMarketoConnectorName: name of the Marketo connector, created above
- ConsumerAccountBucket:
consumer-databucket-<consumer account id>
- ConsumerAccountBucketTargetFolder:
marketo-leads-source
- ConsumerAccountEventBusArn:
arn:aws:events:<region>:<consumer account id>:event-bus/consumer-custom-event-bus
- DefaultEventBusArn:
arn:aws:events:<region>:<producer account id>:event-bus/default
- Choose Next.
- On the next page, choose Next.
- Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
- Choose Create stack.
Stack creation takes approximately 5 minutes to complete. It will create the following resources. You can find them on the Outputs tab of the CloudFormation stack.
- Producer AppFlow –
producer-flow
- Producer Bucket –
arn:aws:s3:::producer-bucket.<region>.<producer account id>
- Producer Flow Completion Rule –
arn:aws:events:<region>:<producer account id>:rule/producer-appflow-completion-event
- Producer Step Function –
arn:aws:states:<region>:<producer account id>:stateMachine:ProducerStateMachine-xxxx
- Producer Step Function Role –
arn:aws:iam::<producer account id>:role/service-role/producer-stepfunction-role
- After successful creation of the resources, go to the consumer account S3 bucket,
consumer-databucket-<consumer account id>
, and update the bucket policy as follows:
Validate the workflow
Let’s walk through the flow:
- Review the Marketo and Salesforce connection setup in the producer and consumer account respectively.
In the architecture section, we suggested scheduling the AppFlow (producer-flow
) in the producer account. However, for quick testing purposes, we demonstrate how to manually run the flow on demand.
- Go to the AppFlow (
producer-flow
) in the producer account. On the Filters tab of the flow, choose Edit filters. - Choose the Created At date range for which you have data.
- Save the range and choose Run flow.
- Review the producer S3 bucket.
AppFlow generates the files in the producer-flow
prefix within this bucket. The files are temporarily located in the producer S3 bucket under s3://<producer-bucket>.<region>.<account-id>/producer-flow
.
- Review the EventBridge rule and Step Functions state machine in the producer account.
The Amazon AppFlow job completion triggers an EventBridge rule (arn:aws:events:<region>:<producer account id>:rule/producer-appflow-completion-event
, as noted in the Outputs tab of the CloudFromation stack in the Producer Account), which triggers the Step Functions state machine (arn:aws:states:<region>:<producer account id>:stateMachine:ProducerStateMachine-xxxx
) in the producer account. The state machine copies the files to the consumer S3 bucket from the producer-flow
prefix in the producer S3 bucket. Once file copy is complete, the state machine moves the files from the producer-flow
prefix to the archive prefix in the producer S3 bucket. You can find the files in s3://<producer-bucket>.<region>.<account-id>/archive
.
- Review the consumer S3 bucket.
The Step Functions state machine in the producer account copies the files to the consumer S3 bucket and sends an event to EventBridge in the consumer account. The files are located in the consumer S3 bucket under s3://consumer-databucket-<account-id>/marketo-leads-source/
.
- Review the EventBridge rule (
arn:aws:events:<region>:<consumer account id>:rule/consumer-custom-event-bus/consumer-custom-event-bus-rule
) in the consumer account, which should have triggered the Step Function workflow (arn:aws:states:<region>:<consumer account id>:stateMachine:consumer-state-machine
).
The AWS Glue crawler (consumer-glue-crawler
) runs to update the metadata followed by the AWS Glue job (consumer-glue-job
), which curates the data by applying the Do not call filter. The curated files are placed in s3://consumer-databucket-<account-id>/marketo-leads-curated/
. After data curation, the flow is started as part of the state machine.
- Review the Amazon AppFlow job (
arn:aws:appflow:<region>:<consumer account id>:flow/consumer-appflow
) run status in the consumer account.
Upon a successful run of the Amazon AppFlow job, the curated data files are moved to the s3://consumer-databucket-<account-id>/marketo-leads-processed/
folder and Salesforce is updated with the leads. Additionally, all the original source files are moved from s3://consumer-databucket-<account-id>/marketo-leads-source/
to s3://consumer-databucket-<account-id>/marketo-leads-archive/
.
- Review the updated data in Salesforce.
You will see newly created or updated leads created by Amazon AppFlow.
Clean up
To clean up the resources created as part of this post, delete the following resources:
- Delete the resources in the producer account:
- Delete the producer S3 bucket content.
- Delete the CloudFormation stack.
- Delete the resources in the consumer account:
- Delete the consumer S3 bucket content.
- Delete the CloudFormation stack.
Summary
In this post, we showed how you can support a cross-account model to exchange data between different partners with different SaaS integrations using Amazon AppFlow. You can expand this idea to support multiple target accounts.
For more information, refer to Simplifying cross-account access with Amazon EventBridge resource policies. To learn more about Amazon AppFlow, visit Amazon AppFlow.
About the authors
Ramakant Joshi is an AWS Solutions Architect, specializing in the analytics and serverless domain. He has a background in software development and hybrid architectures, and is passionate about helping customers modernize their cloud architecture.
Debaprasun Chakraborty is an AWS Solutions Architect, specializing in the analytics domain. He has around 20 years of software development and architecture experience. He is passionate about helping customers in cloud adoption, migration and strategy.
Suraj Subramani Vineet is a Senior Cloud Architect at Amazon Web Services (AWS) Professional Services in Sydney, Australia. He specializes in designing and building scalable and cost-effective data platforms and AI/ML solutions in the cloud. Outside of work, he enjoys playing soccer on weekends.