[SEO Subhead]
This Guidance introduces data ingestion patterns for connecting advertising and marketing data to AWS services. Data can come from a variety of data stores, and once activated, can be used for setting up a customer 360 profile, an AWS Clean Rooms collaboration, artificial intelligence and machine learning (AI/ML) training, and analytics applications. This Guidance includes an overview architecture diagram demonstrating the data pipeline in addition to six architectural patterns that show different approaches to provision data for your analytical workloads.
Please note: [Disclaimer]
Architecture Diagram

-
Overview
-
API Pull Pattern with AWS Lambda
-
API Pull Pattern with Amazon AppFlow
-
Push Pattern with Amazon S3
-
Batch Pull and Change Data Capture Pattern
-
Managed File Transfer Pattern
-
File Replication Pattern
-
Overview
-
This architecture diagram shows an overview of how to connect data sources stored in a variety of data sources to AWS. To review the architectural patterns, open the other tabs.
Step 1
Sources of data needed for advertising and marketing analytics belong to one of the three categories: software as a service (SaaS) applications, relational databases, or file storage. -
API Pull Pattern with AWS Lambda
-
This architecture diagram shows an API pull pattern with AWS Lambda for Amazon Ads and Amazon Selling Partner APIs. To review the other architectural patterns, open the other tabs.
Step 1
Amazon EventBridge schedules a job that starts an AWS Step Functions state machine. The state machine processes a series of AWS Lambda functions to facilitate report creation. -
API Pull Pattern with Amazon AppFlow
-
This architecture diagram shows an API pull pattern with Amazon AppFlow for SaaS application data. To review the other architectural patterns, open the other tabs.
Step 1
EventBridge schedules a job that starts the Step Functions state machine, which includes the launch of Amazon Appflow. -
Push Pattern with Amazon S3
-
This architecture diagram shows a push pattern with Amazon S3 for SaaS application data. To review the other architectural patterns, open the other tabs.
Step 1
External data sources push raw data files (such as CSV) into a daily partitioned Landing Zone S3 bucket. Refer to external documentation to set up push job inputs like S3 bucket location, access key, and schedule frequency. -
Batch Pull and Change Data Capture Pattern
-
This architecture diagram shows a push pattern with Amazon S3 for SaaS application data. To review the other architectural patterns, open the other tabs.
Step 1a
Use AWS Glue and pre-built or marketplace connectors to extract data needed for advertising and marketing analytical use case from the relational database management system (RDBMS) in batch mode. AWS Glue will retrieve the data from data stores and load it to an S3 bucket. Amazon S3 is configured as a target for storing remote database files in parquet format. -
Managed File Transfer Pattern
-
This architecture diagram shows a batch pull and change data capture pattern for RDBMS sources. To review the other architectural patterns, open the other tabs.
Step 1
AWS Transfer Family securely migrates data stored in file systems (on-premises or on a cloud) that is needed for advertising and marketing analytical use cases. -
File Replication Pattern
-
This architecture diagram shows a managed file transfer pattern for SFTP data sources. To review the other architectural patterns, open the other tabs.
Step 1
Install and configure AWS DataSync Agent on a virtual machine in the public cloud where the source Object Storage is hosted.
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
The services in this Guidance are serverless, which eliminates the need for users to manage (virtual or bare metal) servers. For example, Step Functions is a serverless managed service for building workflows and reduces undifferentiated heavy lifting associated with building and managing a workflow solution. AWS Glue is a serverless managed service for data processing tasks.
Similarly, the following services eliminate the need for capacity management: Amazon SNS for notifications, AWS KMS for key management, Secrets Manager for secrets, EventBridge for event driven architectures, DynamoDB for low-latency NoSQL databases, AppFlow for integrating with third-party applications, Transfer Family for file transfer protocols, DataSync for discovery and sync of remote data sources (on-premises or other clouds), and AWS DMS for a managed data migration service that simplifies migration between supported databases.
-
Security
IAM manages least privilege access to specific resources and operations. AWS KMS provides encryption for data at rest and data in transit using Pretty Good Privacy (PGP) encryption of data files. Secrets Manager provides secrets for remote system access and hashing keys for personally identifiable information (PII) data. CloudWatch monitors logs and metrics across all services used in this Guidance. As managed services, these services not only support a security strong posture, but help free up time for you to focus efforts on data and application logic for fortified security.
-
Reliability
Use of Lambda in the pipeline is limited to file-level processing, such as decryption. This avoids the pipeline from hitting the 15-minute run time limit. For all row-level processing, AWS Glue Spark engine scales to handle large volume of data processing. Additionally, you can use Step Functions to set up retries, back-off rates, max attempts, intervals, and timeouts for any failed AWS Glue job.
-
Performance Efficiency
The serverless services in this Guidance (including Step Functions, AWS Glue, Lambda, EventBridge, and Amazon S3) reduce the amount of underlying infrastructure you need to manage, allowing you to focus on solving your business needs. You can use automated deployments to quickly deploy the architectural components into any AWS Region while also addressing data residency and low latency requirements.
-
Cost Optimization
When AWS Glue performs data transformations, you only pay for infrastructure during the time the processing is occurring. For Data Catalog, you pay a simple monthly fee for storing and accessing the metadata. With EventBridge Free Tier, you can schedule rules to initiate a data processing workflow. With a Step Functions workflow, you are charged based on the number of state transitions. In addition, through a tenant isolation model and resource tagging, you can automate cost usage alerts to help you measure costs specific to each tenant, application module, and service.
-
Sustainability
Serverless services used in this Guidance (such as AWS Glue, Lambda, and Amazon S3) automatically optimize resource utilization in response to demand. You can extend this Guidance by using Amazon S3 lifecycle configuration to define policies that move objects to different storage classes based on access patterns.
Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content

[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.