AWS Storage Blog
Connect Snowflake to S3 Tables using the SageMaker Lakehouse Iceberg REST endpoint
Organizations today seek data analytics solutions that provide maximum flexibility and accessibility. Customers need their data to be readily available using their preferred query engines, and break down barriers across different computing environments. At the same time, they want a single copy of data to be used across these solutions, to track lineage, be cost […]
Build a managed transactional data lake with Amazon S3 Tables
UPDATE (12/19/2024): Added guidance for Amazon EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]
Optimizing performance of Apache Spark workloads on Amazon S3
This blog covers performance metrics, optimizations, and configuration tuning specific to OSS Spark running on Amazon EKS. For customers using or considering Amazon EMR on EKS, refer to the service documentation to get started and this blog post for the latest performance benchmark. Performance is top of mind for customers running streaming, extract transform load […]
