AWS Big Data Blog

Category: Analytics

A Zero-Administration Amazon Redshift Database Loader

Ian Meyers is a Solutions Architecture Senior Manager with AWS With this new AWS Lambda function, it’s never been easier to get file data into Amazon Redshift. You simply push files into a variety of locations on Amazon S3 and have them automatically loaded into your Amazon Redshift clusters. Using AWS Lambda with Amazon Redshift […]

Using IPython Notebook to Analyze Data with Amazon EMR

Manjeet Chayel is a Solutions Architect with AWS IPython Notebook is a web-based interactive environment that lets you combine code, code execution, mathematical functions, rich documentation, plots, and other elements into a single document. In the background, IPython Notebook stores this information as a JSON document. The main advantage of a notebook when compared to […]

Snakes in the Stream – Feeding and Eating Amazon Kinesis Streams with Python

Markus Schmidberger is a Senior Consultant for AWS Professional Services The Internet of Things (IoT) is becoming increasingly popular, and it’s easy to see why: it generates new business value for your company by connecting all available machines and devices. The big challenge is real-time data processing and analysis. Cloud computing is an excellent way […]

Running Apache Accumulo on Amazon EMR

Manjeet Chayel is a Solutions Architect with Amazon Web Services This post was co-authored by Matt Yanchyshyn, a Principal Solutions Architect with Amazon Web Services Apache Accumulo is a sorted, distributed key-value store that is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo was originally modeled after Google’s BigTable and can scale to […]

Strategies for Reducing Your Amazon EMR Costs

UPDATE, MAY 2019: We have updated the Amazon EC2 Spot pricing model as of November, 2017. The new pricing model simplifies purchasing without bidding and with fewer interruptions. Learn more about the updated pricing model. —————————————————— This is a guest post by Prateek Gupta, a lead engineer at BloomReach BloomReach has built a personalized discovery […]

Node.js Streaming MapReduce with Amazon EMR

Ian Meyers is a Solutions Architecture Senior Manager with AWS Introduction Node.js is a JavaScript framework for running high performance server-side applications based upon non-blocking I/O and an asynchronous, event-driven processing model. When customers need to process large volumes of complex data, Node.js offers a runtime that natively supports the JSON data structure. Languages such […]

Getting HBase Running on Amazon EMR and Connecting it to Amazon Kinesis

Wangechi Doble is an AWS Solutions Architect Introduction Apache HBase is an open-source, column-oriented, distributed NoSQL database that runs on the Apache Hadoop framework. In the AWS Cloud, you can choose to deploy Apache HBase on Amazon Elastic Compute Cloud (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on […]