AWS Big Data Blog

Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams

Rahul Bhartia is a Solutions Architect with AWS Martin Schade, a Solutions Architect with AWS, also contributed to this post. Do you use real-time analytics on AWS to quickly extract value from large volumes of data streams? For example, have you built a recommendation engine on clickstream data to personalize content suggestions in real time […]

Read More

Introducing On-Demand Pipeline Execution in AWS Data Pipeline

Marc Beitchman is a Software Development Engineer in the AWS Database Services team Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. You can access this functionality through the existing AWS Data Pipeline activation API. On-demand schedules make it easy to integrate pipelines in AWS […]

Read More

Join us at the AWS Big Data Meetup on February 24th in Palo Alto

by Steve McPherson | on | Permalink | Comments |  Share

Join and RSVP! Guest Speaker: Cory Dolphin from Twitter Learn about how Answers, Fabric’s realtime analytics product, which processes billions of events in realtime, using Twitter’s new stream processing engine, Heron. Cory will explain some of the challenges the team faced while scaling Storm, and how Heron has helped them fly faster. Specifically, Cory will describe how Heron’s […]

Read More

Process Amazon Kinesis Aggregated Data with AWS Lambda

Ian Meyers is a Solutions Architecture Senior Manager with AWS Last year, we introduced the Amazon Kinesis Producer Library (KPL) to simplify the development of applications that need to send data to Amazon Kinesis Streams. Many customers use aggregation, which allows you to send multiple records to a single Amazon Kinesis Streams record.  Although the […]

Read More

Big Data Analytics Options on AWS: Updated White Paper

by Erik Swensson | on | Permalink | Comments |  Share

Erik Swensson is an Enterprise Solutions Architect Manager for AWS The big data ecosystem is growing quickly. Many AWS services have recently been added, such as AWS Lambda, Amazon Elasticsearch Service, Amazon Kinesis Firehose, and Amazon Machine Learning. We’ve also made significant enhancements to existing analytics offerings, such as supporting JSON documents in Amazon DynamoDB […]

Read More

Amazon Redshift UDF repository on AWSLabs

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post Did you ever have a need for complex string parsing in Amazon Redshift and wish you could simply add f_parse_url_query_string(url) to your SQL query? Have you ever tried to weigh which would be less […]

Read More

Submitting User Applications with spark-submit

Francisco Oliveira is a consultant with AWS Professional Services Customers starting their big data journey often ask for guidelines on how to submit user applications to Spark running on Amazon EMR. For example, customers ask for guidelines on how to size memory and compute resources available to their applications and the best resource allocation model […]

Read More

Month in Review (January 2016)

by Andy Werth | on | Permalink | Comments |  Share

Lots for big data enthusiasts in January on the AWS Big Data Blog. Take a look! Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on Amazon EMR Learn how to set up Zeppelin running “off-cluster” on a separate EC2 instance. You’ll  be able to submit Spark jobs to an EMR cluster directly […]

Read More

Turning Amazon EMR into a Massive Amazon S3 Processing Engine with Campanile

Michael Wallman is a senior consultant with AWS ProServ Have you ever had to copy a huge Amazon S3 bucket to another account or region? Or create a list based on object name or size? How about mapping a function over millions of objects? Amazon EMR to the rescue! EMR allows you to deploy large […]

Read More

Agile Analytics with Amazon Redshift

Nick Corbett is a Big Data Consultant for AWS Professional Services What makes outstanding business intelligence (BI)? It needs to be accurate and up-to-date, but this alone won’t differentiate a solution. Perhaps a better measure is to consider the reaction you get when your latest report or metric is released to the business. Good BI […]

Read More