AWS Big Data Blog

Dipal Mahajan

Author: Dipal Mahajan

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

In this post, we’ll walk through an example ETL process that uses session reuse to efficiently create, populate, and query temporary staging tables across the full data transformation workflow—all within the same persistent Amazon Redshift database session. You’ll learn best practices for optimizing ETL orchestration code, reducing job runtimes by eliminating connection overhead, and simplifying pipeline complexity

Architecture_Diagram

Automate large-scale data validation using Amazon EMR and Apache Griffin

Many enterprises are migrating their on-premises data stores to the AWS Cloud. During data migration, a key requirement is to validate all the data that has been moved from source to target. This data validation is a critical step, and if not done correctly, may result in the failure of the entire project. However, developing […]