AWS Big Data Blog

Tag: Hive

Turbocharge your Apache Hive Queries on Amazon EMR using LLAP

Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop cluster using SQL. Data analysts and scientists use Hive to query, summarize, explore, and analyze big data. With the introduction of Hive LLAP (Low Latency Analytical Processing), the notion of Hive being just a batch processing tool has […]

Read More

Data Lake Ingestion: Automatically Partition Hive External Tables with AWS

In this post, I introduce a simple data ingestion and preparation framework based on AWS Lambda, Amazon DynamoDB, and Apache Hive on EMR for data from different sources landing in S3. This solution lets Hive pick up new partitions as data is loaded into S3 because Hive by itself cannot detect new partitions as data lands.

Read More