Guidance for Incremental Data Exports on AWS

This Guidance demonstrates a robust approach to incrementally export and maintain a centralized data repository reflecting ongoing changes in a distributed database. It shows how to establish a data pipeline that seamlessly captures and integrates incremental updates into an existing data store. The incremental process minimizes redundancy, enhances consistency, and improves access to accurate, up-to-date information. With more accurate data, you can make more informed, data-driven decisions.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF

Guidance Architecture Diagram for Incremental Data Exports on AWS

Step 1
Application traffic consistently adds, updates, and deletes items in an Amazon DynamoDB table.

Step 2
Perform a full export of your DynamoDB table. This will write the exported data into Amazon Simple Storage Service (Amazon S3) in JSON format.

Step 3
Create, prepare, and use Amazon EMR Serverless to read the full export of the DynamoDB table from Amazon S3. EMR Serverless will dynamically identify Iceberg table schema with the full set of columns that will map to all the unique attributes from your full DynamoDB exported dataset.

Step 4
Create AWS Glue Data Catalog to persist the Iceberg table meta store and query the table from Amazon Athena (or any Hive Meta store compatible query engine) using the same Glue Catalog.

Step 5
Use EMR Serverless to build the Iceberg table based on the full export of the DynamoDB table, and use the Iceberg table generated schema.

Step 6
Analysts can use an Athena query to verify that the Iceberg table is accessible and readable. This involves running a SELECT query on the Iceberg table through Athena to confirm that data can be retrieved successfully and accurately.

Step 7
Perform an incremental export of your DynamoDB table in JSON format. This will only export the changed data from the DynamoDB table since the last full or incremental export.

Step 8
Use EMR Serverless to update a previously created Iceberg table with the incremental export of the DynamoDB table data.

Step 9
Analysts will use the same Athena query to verify that the Iceberg table shows changed records. For example, if you’ve added or deleted items in the DynamoDB table after full export, the count should reflect this.

Get Started

Deploy this Guidance

Sample code

Use sample code to deploy this Guidance in your AWS account

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

By offloading storage management to Amazon S3, you can concentrate on application development and analytics workflows without infrastructure overhead. Amazon S3 handles thousands of transactions per second, resulting in seamless upload and retrieval of data. This operational efficiency allows teams to optimize their core competencies while AWS manages the underlying storage infrastructure.

Read the Operational Excellence whitepaper
Security

DynamoDB, Amazon S3, and AWS Glue provide encryption, access controls, and audit logging, enabling users to meet security and compliance requirements. Athena and EMR Serverless inherit robust security features from their underlying services, helping to ensure data privacy and compliance. As AWS fully manages these services, security best practices are consistently implemented, reducing the burden of managing security measures.

Read the Security whitepaper
Reliability

DynamoDB and Amazon S3 prioritize high availability through cross-Availability Zone replication and data redundancy within a Region, helping to maintain accessibility during failures or disruptions. Amazon S3 offers 99.999999999% durability, preventing data loss over time, while DynamoDB provides 99.999% availability and durability for tables. These fault-tolerant services employ redundant storage and distributed architectures, minimizing the impact of failures on data integrity and service availability.

Read the Reliability whitepaper
Performance Efficiency

AWS Glue streamlines data discovery through its centralized metadata repository, reducing time spent locating and accessing datasets. It automatically scales resources to match extract, transform, load (ETL) job demands for optimal performance without manual intervention. AWS Glue's automated resource provisioning and efficient data cataloging contribute to high performance efficiency, allowing for seamless data processing and analytics workflows.

Read the Performance Efficiency whitepaper
Cost Optimization

EMR Serverless automatically provisions and releases resources based on job requirements, eliminating costs when jobs are not running. This consumption-based model is ideal for workloads with intermittent, short-duration processing followed by long idle periods. EMR Serverless optimizes costs by dynamically scaling resources to match demand, so that you only incur charges for the resources consumed and avoid unnecessary expenses during inactive periods.

Read the Cost Optimization whitepaper
Sustainability

The serverless and scalable nature of AWS services like Amazon S3 and EMR Serverless optimizes compute and backend resource usage, effectively minimizing the environmental impact of your workloads.

Read the Sustainability whitepaper

[SEO Subhead]

Architecture Diagram

Get Started

Deploy this Guidance

Sample code

Well-Architected Pillars

Related Content

Use Amazon DynamoDB incremental export to update Apache Iceberg tables

Disclaimer

Was this page helpful?

Guidance for Incremental Data Exports on AWS

[SEO Subhead]

Architecture Diagram

Get Started

Deploy this Guidance

Sample code

Well-Architected Pillars

Related Content

Use Amazon DynamoDB incremental export to update Apache Iceberg tables

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer