AWS Big Data Blog

Tag: Data Lake

Optimize Amazon S3 for High Concurrency in Distributed Workloads

In today’s blog post, I will discuss how to optimize Amazon S3 for an architecture commonly used to enable genomic data analyses. This optimization is important to my work in genomics because, as genome sequencing continues to drop in price, the rate at which data becomes available is accelerating.

Using Spark SQL for ETL

Ben Snively is a Solutions Architect with AWS With big data, you deal with many different formats and large volumes of data. SQL-style queries have been around for nearly four decades. Many systems support SQL-style syntax on top of the data layers, and the Hadoop/Spark ecosystem is no exception. This allows companies to try new […]

Newer posts →

AWS Big Data Blog

Tag: Data Lake

Optimize Amazon S3 for High Concurrency in Distributed Workloads

Using Spark SQL for ETL

Learn

Resources

Developers

Help