Author: Srikanth Kodali

Srikanth Kodali is a Senior IoT data analytics architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on building IoT data and analytics solutions, helping them improve the value of their solutions when using AWS.

Optimize downstream data processing with Amazon Data Firehose and Amazon EMR running Apache Spark

This blog post shows how to use Amazon Kinesis Data Firehose to merge many small messages into larger messages for delivery to Amazon S3, which results in faster processing with Amazon EMR running Spark. This post also shows how to read the compressed files using Apache Spark that are in Amazon S3, which does not have a proper file name extension and store back in Amazon S3 in parquet format.

AWS Big Data Blog

Author: Srikanth Kodali

Optimize downstream data processing with Amazon Data Firehose and Amazon EMR running Apache Spark

Learn

Resources

Developers

Help