AWS Big Data Blog

Join us at the AWS Big Data Meetup on January 13th in San Francisco

by Steve McPherson | on | Permalink | Comments |  Share

The AWS Big Data Meetup brings Big Data developers and enthusiasts together to discuss Big Data solutions with each other and AWS team members. At the event you will hear speakers from AWS and the wider community who are pushing the boundaries of Big Data. We are committed to maintaining a technical focus, and invite […]

Read More

Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on Amazon EMR

Dominic Murphy is an Enterprise Solution Architect with Amazon Web Services Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results. Zeppelin notebooks can be shared among several users, […]

Read More

Month in Review: December 2015

by Andy Werth | on | Permalink | Comments |  Share

Lots for big data enthusiasts in December on the AWS Big Data Blog. Take a look! Top 10 Performance Tuning Techniques for Amazon Redshift “This post takes you through the most common issues that customers find as they adopt Amazon Redshift, and gives you concrete guidance on how to address each.” Migrating Metadata when Encrypting […]

Read More

Query Routing and Rewrite: Introducing pgbouncer-rr for Amazon Redshift and PostgreSQL

Bob Strahan is a senior consultant with AWS Professional Services Have you ever wanted to split your database load across multiple servers or clusters without impacting the configuration or code of your client applications? Or perhaps you have wished for a way to intercept and modify application queries, so that you can make them use […]

Read More

Securely Access Web Interfaces on Amazon EMR Launched in a Private Subnet

Ben Snively is a Solutions Architect with AWS Private subnets allow you to limit access to deployed components, and to control security and routing of the system. You can also use a private subnet to connect an on-premises local network to AWS through a VPN or AWS Direct Connect.  Amazon EMR allows customers to launch […]

Read More

Performance Tuning Your Titan Graph Database on AWS

Nick Corbett is a Big Data Consultant for AWS Professional Services Graph databases can outperform an RDBMS and give much simpler query syntax for many use cases. In my last post, Building a Graph Database on AWS Using Amazon DynamoDB and Titan, I showed how a network of relationships can be stored and queried using […]

Read More

Top 10 Performance Tuning Techniques for Amazon Redshift

Ian Meyers is a Solutions Architecture Senior Manager with AWS Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. Customers use Amazon Redshift for everything from accelerating existing database environments that are struggling to […]

Read More

Migrating Metadata when Encrypting an Amazon Redshift Cluster

John Loughlin is a Solutions Architect with Amazon Web Services A customer came to us asking for help expanding and modifying their Amazon Redshift cluster. In the course of responding to their request, we made use of several tools available in the AWSLabs GitHub repository. What follows is an account of how you can use […]

Read More

Big Data AWS Training Course Gets Big Update

by Michael Stroh | on | Permalink | Comments |  Share

Michael Stroh is Communications Manager for AWS Training & Certification AWS offers a number of in-depth technical training courses, which we’re regularly updating in response to student feedback and changes to the AWS platform. Today I want to tell you about some exciting changes to Big Data on AWS, our most comprehensive training course on […]

Read More

Building a Near Real-Time Discovery Platform with AWS

Assaf Mentzer is a Senior Consultant for AWS Professional Services In the spirit of the U.S presidential election of 2016, in this post I use Twitter public streams to analyze the candidates’ performance, both Republican and Democrat, in a near real-time fashion. I show you how to integrate AWS managed services—Amazon Kinesis Firehose, AWS Lambda […]

Read More