Month in Review: February 2016
Lots for big data enthusiasts in February on the AWS Big Data Blog. Take a look!
Learn how to set spark-submit flags to control the memory and compute resources available to your application submitted to Spark running on EMR. Learn when to use the maximizeResourceAllocation configuration option and dynamic allocation of executors.
Discover the new Python UDF functions that AWS has released as part of the initial AWS Labs Amazon Redshift UDF repository: column encryption, parsing, date functions, and more! (And be sure to check out our Introduction to Python UDFs in Amazon Redshift.)
If you use Amazon Kinesis Streams and Apache Spark to quickly extract value from large volumes of data streams, understanding how these frameworks work together helps you optimize performance. This post explains some ways to tune Spark Streaming for the best performance and the right semantics.
We’ve made exciting changes to the Big Data Analytics Options white paper (first published December 2014). This white paper introduces you to the many big data analytics options on the AWS platform and helps you determine when to choose one solution over another. It covers ideal usage patterns, cost model, performance, durability and availability, scalability and elasticity, interfaces, and anti-patterns for many AWS services.
We are excited to announce that you can now process this data easily using Amazon Kinesis de-aggregation modules. These modules support Java, Python, and Node.js and allow you to extract user records from a KPL-aggregated stream to AWS Lambda or your multi-lang KCL application.
The guest speaker for this Meetup was Cory Dolphin from Twitter, who talked about Answers, Fabric’s real-time analytics product, which processes billions of events in real time, using Twitter’s new stream processing engine, Heron. Nathaniel Slater, Solutions Architect in AWS Big Data Services, spoke about Amazon DynamoDB.
Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. On-demand schedules make it easy to integrate pipelines in AWS Data Pipeline with other AWS services and with on-premise orchestration engines.
FROM THE ARCHIVE
Top 10 Performance Tuning Techniques for Amazon Redshift (December 2015)