AWS Big Data Blog

Month in Review: December 2015

Lots for big data enthusiasts in December on the AWS Big Data Blog. Take a look!

Top 10 Performance Tuning Techniques for Amazon Redshift

“This post takes you through the most common issues that customers find as they adopt Amazon Redshift, and gives you concrete guidance on how to address each.”

Migrating Metadata when Encrypting an Amazon Redshift Cluster

“The customer is acquiring a manufacturing company that is slightly smaller than they are. Each has a BI infrastructure and they believe consolidating platforms would lower expenses and simplify operations. They want to move the acquired organization’s warehouse into the existing Amazon Redshift cluster, but  have a contractual obligation to encrypt data.”

Performance Tuning Your Titan Graph Database on AWS

“Graph databases can outperform an RDBMS and give much simpler query syntax for many use cases. In my last post, Building a Graph Database on AWS Using Amazon DynamoDB and Titan, I showed how a network of relationships can be stored and queried using a graph database. In this post, I show you how to tune the performance of your Titan database running on Amazon DynamoDB in AWS.”

Securely Access Web Interfaces on Amazon EMR Launched in a Private Subnet

“In this post, I outline two possible solutions to securely access web UIs on an EMR cluster running in a private subnet. These options cover scenarios such as a connecting through a local network to your VPC or connecting through the Internet if your private subnet is not directly accessible.”

Query Routing and Rewrite: Introducing pgbouncer-rr for Amazon Redshift and PostgreSQL

“Have you ever wanted to split your database load across multiple servers or clusters without impacting the configuration or code of your client applications? Or perhaps you have wished for a way to intercept and modify application queries, so that you can make them use optimized tables (sorted, pre-joined, pre-aggregated, etc.), add security filters, or hide changes you have made in the schema?”




(April 9, 2015)

Nasdaq’s Architecture using Amazon EMR and Amazon S3 for Ad Hoc Access to a Massive Data Set

Nate Sammons, a Principal Architect for Nasdaq, describes Nasdaq’s new data warehouse initiative: “Because we can now use Amazon S3 client-side encryption with EMRFS, we can meet our security requirements for data at rest in Amazon S3 and enjoy the scalability and ecosystem of applications in Amazon EMR.”