AWS Solutions Library

Guidance for Incremental Data Exports on AWS

Go to sample code

Overview

This Guidance demonstrates a robust approach to incrementally export and maintain a centralized data repository reflecting ongoing changes in a distributed database. It shows how to establish a data pipeline that seamlessly captures and integrates incremental updates into an existing data store. The incremental process minimizes redundancy, enhances consistency, and improves access to accurate, up-to-date information. With more accurate data, you can make more informed, data-driven decisions.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Download the architecture diagram

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.

Go to sample code

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

By offloading storage management to Amazon S3, you can concentrate on application development and analytics workflows without infrastructure overhead. Amazon S3 handles thousands of transactions per second, resulting in seamless upload and retrieval of data. This operational efficiency allows teams to optimize their core competencies while AWS manages the underlying storage infrastructure.

Read the Operational Excellence whitepaper

Security

DynamoDB, Amazon S3, and AWS Glue provide encryption, access controls, and audit logging, enabling users to meet security and compliance requirements. Athena and EMR Serverless inherit robust security features from their underlying services, helping to ensure data privacy and compliance. As AWS fully manages these services, security best practices are consistently implemented, reducing the burden of managing security measures.

Read the Security whitepaper

Reliability

DynamoDB and Amazon S3 prioritize high availability through cross-Availability Zone replication and data redundancy within a Region, helping to maintain accessibility during failures or disruptions. Amazon S3 offers 99.999999999% durability, preventing data loss over time, while DynamoDB provides 99.999% availability and durability for tables. These fault-tolerant services employ redundant storage and distributed architectures, minimizing the impact of failures on data integrity and service availability.

Read the Reliability whitepaper

Performance Efficiency

AWS Glue streamlines data discovery through its centralized metadata repository, reducing time spent locating and accessing datasets. It automatically scales resources to match extract, transform, load (ETL) job demands for optimal performance without manual intervention. AWS Glue's automated resource provisioning and efficient data cataloging contribute to high performance efficiency, allowing for seamless data processing and analytics workflows.

Read the Performance Efficiency whitepaper

Cost Optimization

EMR Serverless automatically provisions and releases resources based on job requirements, eliminating costs when jobs are not running. This consumption-based model is ideal for workloads with intermittent, short-duration processing followed by long idle periods. EMR Serverless optimizes costs by dynamically scaling resources to match demand, so that you only incur charges for the resources consumed and avoid unnecessary expenses during inactive periods.

Read the Cost Optimization whitepaper

Sustainability

The serverless and scalable nature of AWS services like Amazon S3 and EMR Serverless optimizes compute and backend resource usage, effectively minimizing the environmental impact of your workloads.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Guidance for Incremental Data Exports on AWS

Overview

How it works

Deploy with confidence

Well-Architected Pillars

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help

Guidance for Incremental Data Exports on AWS

Overview

How it works

Deploy with confidence

Well-Architected Pillars

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Related Content

Use Amazon DynamoDB incremental export to update Apache Iceberg tables

Disclaimer

Did you find what you were looking for today?

Learn

Resources

Developers

Help