AWS Big Data Blog

Month in Review (January 2016)

Lots for big data enthusiasts in January on the AWS Big Data Blog. Take a look! Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on Amazon EMR Learn how to set up Zeppelin running “off-cluster” on a separate EC2 instance. You’ll  be able to submit Spark jobs to an EMR cluster directly […]

Read More

Turning Amazon EMR into a Massive Amazon S3 Processing Engine with Campanile

Michael Wallman is a senior consultant with AWS ProServ Have you ever had to copy a huge Amazon S3 bucket to another account or region? Or create a list based on object name or size? How about mapping a function over millions of objects? Amazon EMR to the rescue! EMR allows you to deploy large […]

Read More

Agile Analytics with Amazon Redshift

Nick Corbett is a Big Data Consultant for AWS Professional Services What makes outstanding business intelligence (BI)? It needs to be accurate and up-to-date, but this alone won’t differentiate a solution. Perhaps a better measure is to consider the reaction you get when your latest report or metric is released to the business. Good BI […]

Read More

Querying Amazon Kinesis Streams Directly with SQL and Spark Streaming

Amo Abeyaratne is a Big Data consultant with AWS Professional Services Introduction What if you could use your SQL knowledge to discover patterns directly from an incoming stream of data? Streaming analytics is a very popular topic of conversation around big data use cases.  These use cases can vary from just accumulating simple web transaction […]

Read More

Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on Amazon EMR

Dominic Murphy is an Enterprise Solution Architect with Amazon Web Services Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results. Zeppelin notebooks can be shared among several users, […]

Read More

Month in Review: December 2015

Lots for big data enthusiasts in December on the AWS Big Data Blog. Take a look! Top 10 Performance Tuning Techniques for Amazon Redshift “This post takes you through the most common issues that customers find as they adopt Amazon Redshift, and gives you concrete guidance on how to address each.” Migrating Metadata when Encrypting […]

Read More

Query Routing and Rewrite: Introducing pgbouncer-rr for Amazon Redshift and PostgreSQL

Bob Strahan is a senior consultant with AWS Professional Services Have you ever wanted to split your database load across multiple servers or clusters without impacting the configuration or code of your client applications? Or perhaps you have wished for a way to intercept and modify application queries, so that you can make them use […]

Read More