AWS Big Data Blog

Getting Started with Amazon EMR Bootstrap Actions

Steve McPherson is a Senior Manager for Amazon Elastic MapReduce Note: This post was updated 2/8/16. The Presto bootstrap action documented in the original post has been deprecated because EMR now offers a Presto-Sandbox as a full-fledged EMR application. For details, see the EMR sandbox.   Amazon Elastic MapReduce (EMR) is a fully managed Hadoop-as-a-service platform […]

Read More

Using AWS for Multi-instance, Multi-part Uploads

James Saull is a Principal Solutions Architect with AWS There are many advantages to using multi-part, multi-instance uploads for large files. First, the throughput is improved because you can upload parts in parallel. Amazon Simple Storage Service (Amazon S3) can store files up to 5TB, yet a single machine with a 1Gbps interface would take […]

Read More

Moving Big Data Into the Cloud using Signiant Flight

Matt Yanchyshyn is a Principal Solutions Architect with Amazon Web Services Introduction In the first two parts of this series we discussed two popular products–out of many possible solutions–for moving big data into the cloud: Tsunami UDP and Data Expedition’s ExpeDat S3 Gateway. Today we’ll look at another option that takes a different approach: Signiant […]

Read More

Moving Big Data into the Cloud with Tsunami UDP

Matt Yanchyshyn is a Principal Solutions Architect with Amazon Web Services AWS Solutions Architect Leo Zhadanovsky also contributed to this post. Introduction One of the biggest challenges facing companies that want to leverage the scale and elasticity of AWS for analytics is how to move their data into the cloud. It’s increasingly common to have […]

Read More

Hosting Amazon Kinesis Applications on AWS Elastic Beanstalk

Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon Kinesis provides a scalable and highly available platform for ingesting data from thousands of clients. Once data is available on a Kinesis stream, you can build applications to process the data using the Kinesis Client Library (KCL). KCL provides a framework for managing many […]

Read More

Best Practices for Micro-Batch Loading on Amazon Redshift

Ian Meyers is a Solutions Architecture Senior Manager with AWS Data analysts always want the newest data in their data warehouse. Historically, when transaction-optimized databases were used for warehousing analysts would “trickle load” (replicate data from production systems into the data warehouse) at the expense of read throughput. Analytics data warehouses are traditionally loaded nightly […]

Read More

Powering Gaming Applications with Amazon DynamoDB

Nate Wiger is Principal Gaming Solutions Architect for AWS. Dave Lang, Senior Product Manager for Amazon DynamoDB, also contributed to this article. Amazon DynamoDB is rapidly becoming the go-to database for many of the fastest-growing games in the world. Games like Fruit Ninja (from Halfbrick Studios) and Battle Camp (from PennyPop) have leveraged Amazon DynamoDB’s […]

Read More

Building a Recommender with Apache Mahout on Amazon Elastic MapReduce (EMR)

This is a guest post by Andrew Musselman, who as chief data scientist leads the global big data practice from the technical side at Accenture. He is a PMC member on the Apache Mahout project and is writing a book on data science for O’Reilly. Accenture is an APN Big Data Competency Partner. This post […]

Read More