AWS Big Data Blog

Dispatches from re:Invent – Day 4

Matt Yanchyshyn is a Principal Solutions Architect at AWS I now have a collection of napkins from customer dinners with various AWS technology solutions sketched on them.  This particular napkin is an Amazon DynamoDB schema design for a customer interested in using the new JSON document support to import a bunch of JSON files into […]

Read More

Dispatches from re:Invent – Day 3

Matt Yanchyshyn is a Principal Solutions Architect at AWS During the keynote on Wednesday we announced Amazon RDS for Aurora, a new high-performance and cost-effective relational database.  I heard from multiple AWS re:Invent attendees that they’re really excited about how AWS is innovating in the data storage space, and from a big data perspective it […]

Read More

Dispatches from re:Invent – Day 2

Matt Yanchyshyn is a Principal Solutions Architect at AWS Today hundreds of AWS customers participated in bootcamps at re:Invent, including three sessions in the big data space: Store, Manage, and Analyze Big Data in the Cloud, Real Time Data Processing and Analysis with Amazon Redshift and Amazon Kinesis and Building High-Performance Applications on DynamoDB.  Chris […]

Read More

Dispatches from re:Invent – Day 1

Matt Yanchyshyn is a Principal Solutions Architect at AWS The breakout sesisons at AWS re:Invent start on Wednesday, but plenty of people have already arrived and the product announcements have started. In the big data world, the Amazon DynamoDB team announced a sneak preview of an exciting new feature: DynamoDB Streams (https://aws.amazon.com/blogs/aws/dynamodb-streams-preview/).  This will allow […]

Read More

Is the Big Data Track at re:Invent on Your Schedule?

AWS re:Invent is just a few days away! Big data is being used to transform businesses, increase efficiency, and drive innovation. The AWS Cloud has a comprehensive portfolio of big data and HPC services to accelerate and automate how you put data to work in your organization. In the Big Data & HPC track at […]

Read More

Implement a Real-time, Sliding-Window Application Using Amazon Kinesis and Apache Storm

Rahul Bhartia is an AWS Solutions Architect Streams of data are becoming ubiquitous today – clickstreams, log streams, event streams, and more. The need for real-time processing of high-volume data streams is pushing the limits of traditional data processing infrastructures. Building a clickstream monitoring system, for example, where data is in the form of a continuous clickstream rather […]

Read More

Building Multi-AZ or Multi-Region Amazon Redshift Clusters

Erik Swensson is an AWS Solutions Architect. AWS Solutions Architect Patrick Shumate also contributed to this post. This post explores customer options for building multi-region or multi-availability zone (AZ) clusters. By default, Amazon Redshift has excellent tools to back up your cluster via snapshot to Amazon Simple Storage Service (Amazon S3). These snapshots can be […]

Read More

AWS Ad Tech Event Friday Oct 24 in San Francisco

The advertising space is going through a rapid transformation and many of the companies driving this change are using AWS services like Amazon EMR, Amazon Redshift, Amazon DynamoDB, Amazon Kinesis, and Amazon CloudFront.  If you work for an ad tech company in the San Francisco area, you should consider attending a free, one-day event for […]

Read More

Installing Apache Spark on an Amazon EMR Cluster

Jonathan Fritz is a Senior Product Manager for Amazon EMR ———————– Please note – Amazon EMR now officially supports Spark. For more information about Spark on EMR, visit the Spark on Amazon EMR page or read Intent Media’s guest post on the AWS Big Data Blog about Spark on EMR. ——–————— Over the last five […]

Read More

Deploying Cloudera’s Enterprise Data Hub on AWS

Karthik Krishnan is an AWS Solutions Architect UPDATE April 6, 2015: The newest quickstart reference guide supports Cloudera Director 1.1.0. To manage your cluster with Cloudera Director 1.1.0, refer to the updated reference guide.  Apache Hadoop is an open-source software framework to store and process large scale data-sets.  In this post, we discuss the deployment of […]

Read More