AWS Big Data Blog

Big Data Analytics Options on AWS: Updated White Paper

by Erik Swensson | on | | Comments

Erik Swensson is an Enterprise Solutions Architect Manager for AWS

The big data ecosystem is growing quickly. Many AWS services have recently been added, such as AWS Lambda, Amazon Elasticsearch Service, Amazon Kinesis Firehose, and Amazon Machine Learning. We’ve also made significant enhancements to existing analytics offerings, such as supporting JSON documents in Amazon DynamoDB and adding Spark and Presto on Amazon EMR.

With so many tools, it can be hard to know which are best for your workloads and requirements!

To address this, we’ve made exciting changes to the Big Data Analytics Options white paper (first published December 2014). This white paper introduces you to the many big data analytics options on the AWS platform and helps you determine when to choose one solution over another. It covers ideal usage patterns, cost model, performance, durability and availability, scalability and elasticity, interfaces, and anti-patterns on the following services:

Amazon Redshift
Amazon Kinesis
Amazon EMR
Amazon DynamoDB
Amazon Machine Learning
AWS Lambda
Amazon Elasticsearch Service
Amazon EC2 (big data analytics software on EC2 instances)

The whitepaper also walks you through three typical big data analytics workload scenarios, discussing requirements, data flow, and which services are used. The scenarios are Enterprise Data Warehouse, Capturing and Analyzing Sensor Data, and Sentiment Analysis with Social Media. For example, the whitepaper explores sentiment analysis of social media using Amazon Machine Learning, illustrated in the following diagram.

You can use this document as a reference to design new big data applications or re-architect existing ones. In addition, we recently delivered a related webinar, highlighting some of the key aspects of the paper. You can view the slides or watch the recording, but most importantly, make sure to grab your copy of the white paper.