AWS Big Data Blog
Tag: Data Lake
Optimize Amazon S3 for High Concurrency in Distributed Workloads
In today’s blog post, I will discuss how to optimize Amazon S3 for an architecture commonly used to enable genomic data analyses. This optimization is important to my work in genomics because, as genome sequencing continues to drop in price, the rate at which data becomes available is accelerating.
Using Spark SQL for ETL
Ben Snively is a Solutions Architect with AWS With big data, you deal with many different formats and large volumes of data. SQL-style queries have been around for nearly four decades. Many systems support SQL-style syntax on top of the data layers, and the Hadoop/Spark ecosystem is no exception. This allows companies to try new […]
