AWS Big Data Blog

Tag: Apache Parquet

Load data incrementally and optimized Parquet writer with AWS Glue

October 2022: This post was reviewed for accuracy. AWS Glue provides a serverless environment to prepare (extract and transform) and load large amounts of datasets from a variety of sources for analytics and data processing with Apache Spark ETL jobs. The first post of the series, Best practices to scale Apache Spark jobs and partition […]