Data Lake Ingestion: Automatically Partition Hive External Tables with AWS

Songzhi Liu is a Professional Services Consultant with AWS The data lake concept has become more and more popular among enterprise customers because it collects data from different sources and stores it where it can be easily combined, governed, and accessed. On the AWS cloud, Amazon S3 is a good candidate for a data lake […]

Using Spark SQL for ETL

Ben Snively is a Solutions Architect with AWS With big data, you deal with many different formats and large volumes of data. SQL-style queries have been around for nearly four decades. Many systems support SQL-style syntax on top of the data layers, and the Hadoop/Spark ecosystem is no exception. This allows companies to try new […]

