AWS Big Data Blog

Strategies for Reducing Your Amazon EMR Costs

UPDATE, MAY 2019: We have updated the Amazon EC2 Spot pricing model as of November, 2017. The new pricing model simplifies purchasing without bidding and with fewer interruptions. Learn more about the updated pricing model. —————————————————— This is a guest post by Prateek Gupta, a lead engineer at BloomReach BloomReach has built a personalized discovery […]

Node.js Streaming MapReduce with Amazon EMR

Ian Meyers is a Solutions Architecture Senior Manager with AWS Introduction Node.js is a JavaScript framework for running high performance server-side applications based upon non-blocking I/O and an asynchronous, event-driven processing model. When customers need to process large volumes of complex data, Node.js offers a runtime that natively supports the JSON data structure. Languages such […]

Getting HBase Running on Amazon EMR and Connecting it to Amazon Kinesis

Wangechi Doble is an AWS Solutions Architect Introduction Apache HBase is an open-source, column-oriented, distributed NoSQL database that runs on the Apache Hadoop framework. In the AWS Cloud, you can choose to deploy Apache HBase on Amazon Elastic Compute Cloud (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on […]

Visualizing Real-time, Geotagged Data with Amazon Kinesis

Nick Corbett is a Big Data Consultant for AWS Professional Services Amazon Kinesis is a fully managed service for processing real-time data at massive scale.  Whether you are building a system that collects data from remote sensors, aggregating log files from multiple servers, or creating the latest Internet of Things (IoT) solution, Amazon Kinesis lets […]

Dispatches from re:Invent – Day 4

Matt Yanchyshyn is a Principal Solutions Architect at AWS I now have a collection of napkins from customer dinners with various AWS technology solutions sketched on them.  This particular napkin is an Amazon DynamoDB schema design for a customer interested in using the new JSON document support to import a bunch of JSON files into […]

Dispatches from re:Invent – Day 3

Matt Yanchyshyn is a Principal Solutions Architect at AWS During the keynote on Wednesday we announced Amazon RDS for Aurora, a new high-performance and cost-effective relational database.  I heard from multiple AWS re:Invent attendees that they’re really excited about how AWS is innovating in the data storage space, and from a big data perspective it […]

Dispatches from re:Invent – Day 2

Matt Yanchyshyn is a Principal Solutions Architect at AWS Today hundreds of AWS customers participated in bootcamps at re:Invent, including three sessions in the big data space: Store, Manage, and Analyze Big Data in the Cloud, Real Time Data Processing and Analysis with Amazon Redshift and Amazon Kinesis and Building High-Performance Applications on DynamoDB.  Chris […]