AWS Big Data Blog

Month in Review: November 2016

Another month of big data solutions on the Big Data Blog.

Take a look at our summaries below and learn, comment, and share. Thank you for reading!

Use Apache Flink on Amazon EMR
It is even easier to run Flink on AWS as it is now natively supported in Amazon EMR 5.1.0. EMR supports running Flink-on-YARN so you can create either a long-running cluster that accepts multiple jobs or a short-running Flink session in a transient cluster that helps reduce your costs by only charging you for the time that you use.

Scale Your Amazon Kinesis Stream Capacity with UpdateShardCount
With the new Amazon Kinesis Streams UpdateShardCount API operation, you can automatically scale your stream shard capacity by using Amazon CloudWatch alarms, Amazon SNS, and AWS Lambda. In this post, walk through an example of how you can automatically scale your shards using a few lines of code.

Build a Community of Analysts with Amazon QuickSight
In this post, learn how Amazon QuickSight can be used to share dashboards, analyses, and stories. Although fictitious, CoffeeCo, like many companies, benefits from distributing information to people who understand its context and can act on the insights that it contains. 

Dynamically Scale Applications on Amazon EMR with Auto Scaling
With new support for Auto Scaling in Amazon EMR releases 4.x and 5.x, customers can now add (scale out) and remove (scale in) nodes on a cluster more easily. Scaling actions are triggered automatically by Amazon CloudWatch metrics provided by EMR at 5 minute intervals, including several YARN metrics related to memory utilization, applications pending, and HDFS utilization.

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on Amazon EMR with Amazon S3
By migrating to HBase on EMR using S3 for storage, FINRA has lowered its costs by 60%, decreased operational complexity, increased durability and availability, and have created a more scalable architecture.

Introducing the Data Lake Solution on AWS
Learn why a data lake on AWS can increase the flexibility and agility of your analytics.

Analyzing Data in S3 using Amazon Athena
Learn how to use Athena on logs from Elastic Load Balancers, generated as text files in a pre-defined format. We show you how to create a table, partition the data in a format used by Athena, convert it to Parquet, and compare query performance.

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.