AWS Big Data Blog

Category: Analytics

Nasdaq’s Architecture using Amazon EMR and Amazon S3 for Ad Hoc Access to a Massive Data Set

This is a guest post by Nate Sammons, a Principal Architect for Nasdaq The Nasdaq group of companies operates financial exchanges around the world and processes large volumes of data every day. We run a wide variety of analytic and surveillance systems, all of which require access to essentially the same data sets. The Nasdaq […]

Read More

Processing Amazon Kinesis Stream Data Using Amazon KCL for Node.js

Manan Gosalia is an SDE for Amazon Kinesis This blog post shows you how to get started with the Amazon Kinesis Client Library (KCL) for Node.js. The Node.js framework uses an event-driven, non-blocking I/O model that makes it lightweight, efficient, and perfect for data-intensive, real-time applications that run across distributed devices. JavaScript is also simple […]

Read More

Streaming Analytics with DataTorrent RTS and Amazon EMR

Nick Durkin is a Senior Solution Engineer for DataTorrent. DataTorrent is an AWS Technology Partner. In this blog post, we introduce fast big data and provide context about the DataTorrent RTS streaming analytics platform. In addition, we show you how to implement a real-time, streaming analytics application for capturing social media trends from Twitter using […]

Read More

Launching and Running an Amazon EMR Cluster inside a VPC

NOTE: This article contains information and instructions only pertinent to older EMR releases (emr-4.6.0 and earlier) and may no longer be applicable.  For latest information please refer to the current user guide. Daniel Garrison is a Big Data Support Engineer for Amazon Web Services Introduction With Amazon EC2 now firmly in the VPC-by-default model, it’s […]

Read More

Using Amazon EMR and Hunk for Rapid Response Log Analysis and Review

Patrick Shumate is a Solutions Architect for AWS. Introduction It is fairly common to collect access and application logs but never interactively review them. Monitoring dashboards, coupled with well-instrumented applications, allow operators to manage day-to-day operations without ever digging into the flood of logs silently stored in Amazon S3. That works until the monitoring dashboard […]

Read More

Using IPython Notebook to Analyze Data with Amazon EMR

Manjeet Chayel is a Solutions Architect with AWS IPython Notebook is a web-based interactive environment that lets you combine code, code execution, mathematical functions, rich documentation, plots, and other elements into a single document. In the background, IPython Notebook stores this information as a JSON document. The main advantage of a notebook when compared to […]

Read More

Snakes in the Stream – Feeding and Eating Amazon Kinesis Streams with Python

Markus Schmidberger is a Senior Consultant for AWS Professional Services The Internet of Things (IoT) is becoming increasingly popular, and it’s easy to see why: it generates new business value for your company by connecting all available machines and devices. The big challenge is real-time data processing and analysis. Cloud computing is an excellent way […]

Read More

Running Apache Accumulo on Amazon EMR

anjeet Chayel is a Solutions Architect with Amazon Web Services This post was co-authored by Matt Yanchyshyn, a Principal Solutions Architect with Amazon Web Services Apache Accumulo is a sorted, distributed key-value store that is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo was originally modeled after Google’s BigTable and can scale to […]

Read More