AWS Big Data Blog

Using Attunity CloudBeam at UMUC to Replicate Data to Amazon RDS and Amazon Redshift

Matt Yanchyshyn is a Principal Solutions Architect at AWS. Brad Helicher, Director of Cloud Business at Attunity, also contributed to this post. Attunity is an APN Big Data Competency Partner. Introduction University of Maryland University College’s mission is to provide a quality education at an affordable cost to busy professionals, mainly adults who are juggling […]

Statistical Analysis with Open-Source R and RStudio on Amazon EMR

Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Big Data is on every CIO’s mind. It is synonymous with technologies like Hadoop and the ‘NoSQL’ class of databases. Another technology shaking things up in Big Data is R. This blog post describes how to set up R, RHadoop packages and RStudio […]

Using Amazon EMR and Tableau to Analyze and Visualize Data

Rahul Bhartia is an AWS Solutions Architect Introduction Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Pig and Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. […]

Using Amazon Redshift to Analyze Your Elastic Load Balancer Traffic Logs

Biff Gaut is a Solutions Architect with AWS Introduction With the introduction of Elastic Load Balancing (ELB) access logs, administrators have a tremendous amount of data describing all traffic through their ELB. While Amazon Elastic MapReduce (Amazon EMR) and some partner tools are excellent solutions for ongoing, extensive analysis of this traffic, they can require […]

Getting Started with Amazon EMR Bootstrap Actions

Steve McPherson is a Senior Manager for Amazon Elastic MapReduce Note: This post was updated 2/8/16. The Presto bootstrap action documented in the original post has been deprecated because EMR now offers a Presto-Sandbox as a full-fledged EMR application. For details, see the EMR sandbox.   Amazon Elastic MapReduce (EMR) is a fully managed Hadoop-as-a-service platform […]

Using AWS for Multi-instance, Multi-part Uploads

James Saull is a Principal Solutions Architect with AWS There are many advantages to using multi-part, multi-instance uploads for large files. First, the throughput is improved because you can upload parts in parallel. Amazon Simple Storage Service (Amazon S3) can store files up to 5TB, yet a single machine with a 1Gbps interface would take […]

Moving Big Data Into the Cloud using Signiant Flight

Matt Yanchyshyn is a Principal Solutions Architect with Amazon Web Services Introduction In the first two parts of this series we discussed two popular products–out of many possible solutions–for moving big data into the cloud: Tsunami UDP and Data Expedition’s ExpeDat S3 Gateway. Today we’ll look at another option that takes a different approach: Signiant […]