This Guidance introduces data ingestion patterns for connecting advertising and marketing data to AWS services. Data can come from a variety of data stores, and once activated, can be used for setting up a customer 360 profile, an AWS Clean Rooms collaboration, artificial intelligence and machine learning (AI/ML) training, and analytics applications. This Guidance includes an overview architecture diagram demonstrating the data pipeline in addition to six architectural patterns that show different approaches to provision data for your analytical workloads.

Please note: [Disclaimer]

Architecture Diagram

Download the architecture diagram PDF 
  • Overview
  • This architecture diagram shows an overview of how to connect data sources stored in a variety of data sources to AWS. To review the architectural patterns, open the other tabs.

  • API Pull Pattern with AWS Lambda
  • This architecture diagram shows an API pull pattern with AWS Lambda for Amazon Ads and Amazon Selling Partner APIs. To review the other architectural patterns, open the other tabs.

  • API Pull Pattern with Amazon AppFlow
  • This architecture diagram shows an API pull pattern with Amazon AppFlow for SaaS application data. To review the other architectural patterns, open the other tabs.

  • Push Pattern with Amazon S3
  • This architecture diagram shows a push pattern with Amazon S3 for SaaS application data. To review the other architectural patterns, open the other tabs.

  • Batch Pull and Change Data Capture Pattern
  • This architecture diagram shows a push pattern with Amazon S3 for SaaS application data. To review the other architectural patterns, open the other tabs.

  • Managed File Transfer Pattern
  • This architecture diagram shows a batch pull and change data capture pattern for RDBMS sources. To review the other architectural patterns, open the other tabs.

  • File Replication Pattern
  • This architecture diagram shows a managed file transfer pattern for SFTP data sources. To review the other architectural patterns, open the other tabs.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • The services in this Guidance are serverless, which eliminates the need for users to manage (virtual or bare metal) servers. For example, Step Functions is a serverless managed service for building workflows and reduces undifferentiated heavy lifting associated with building and managing a workflow solution. AWS Glue is a serverless managed service for data processing tasks.

    Similarly, the following services eliminate the need for capacity management: Amazon SNS for notifications, AWS KMS for key management, Secrets Manager for secrets, EventBridge for event driven architectures, DynamoDB for low-latency NoSQL databases, AppFlow for integrating with third-party applications, Transfer Family for file transfer protocols, DataSync for discovery and sync of remote data sources (on-premises or other clouds), and AWS DMS for a managed data migration service that simplifies migration between supported databases.

    Read the Operational Excellence whitepaper 
  • IAM manages least privilege access to specific resources and operations. AWS KMS provides encryption for data at rest and data in transit using Pretty Good Privacy (PGP) encryption of data files. Secrets Manager provides secrets for remote system access and hashing keys for personally identifiable information (PII) data. CloudWatch monitors logs and metrics across all services used in this Guidance. As managed services, these services not only support a security strong posture, but help free up time for you to focus efforts on data and application logic for fortified security.

    Read the Security whitepaper 
  • Use of Lambda in the pipeline is limited to file-level processing, such as decryption. This avoids the pipeline from hitting the 15-minute run time limit. For all row-level processing, AWS Glue Spark engine scales to handle large volume of data processing. Additionally, you can use Step Functions to set up retries, back-off rates, max attempts, intervals, and timeouts for any failed AWS Glue job.

    Read the Reliability whitepaper 
  • The serverless services in this Guidance (including Step Functions, AWS Glue, Lambda, EventBridge, and Amazon S3) reduce the amount of underlying infrastructure you need to manage, allowing you to focus on solving your business needs. You can use automated deployments to quickly deploy the architectural components into any AWS Region while also addressing data residency and low latency requirements.

    Read the Performance Efficiency whitepaper 
  • When AWS Glue performs data transformations, you only pay for infrastructure during the time the processing is occurring. For Data Catalog, you pay a simple monthly fee for storing and accessing the metadata. With EventBridge Free Tier, you can schedule rules to initiate a data processing workflow. With a Step Functions workflow, you are charged based on the number of state transitions. In addition, through a tenant isolation model and resource tagging, you can automate cost usage alerts to help you measure costs specific to each tenant, application module, and service.

    Read the Cost Optimization whitepaper 
  • Serverless services used in this Guidance (such as AWS Glue, Lambda, and Amazon S3) automatically optimize resource utilization in response to demand. You can extend this Guidance by using Amazon S3 lifecycle configuration to define policies that move objects to different storage classes based on access patterns.

    Read the Sustainability whitepaper 

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

[Subject]
[Content Type]

[Title]

[Subtitle]
This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?