Guidance for SQL-Based ETL with Apache Spark on Amazon EKS
Unlock efficient data workflows and faster insights with a scalable, enterprise-grade extract, transform, and load (ETL) solution
Overview
This Guidance helps address the gap between data consumption requirements and low-level data processing activities performed by common ETL practices. For organizations operating on SQL-based data management systems, adapting to modern data engineering practices can slow down the progress of harnessing powerful insights from their data. This Guidance provides a quality-aware design for increasing data process productivity through the open-source data framework Arc for a user-centered ETL approach. The Guidance accelerates interaction with ETL practices, fostering simplicity and raising the level of abstraction for unifying ETL activities in both batch and streaming.
We also offer options for an optimal design using efficient compute instances (such as AWS Graviton Processors) that allow you to optimize the performance and cost of running ETL jobs at scale on Amazon EKS.
How it works
This architecture diagram accelerates data processing with Apache Spark on Amazon EKS.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Implementation resources
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages