AWS Big Data Blog
Category: Analytics
Getting Started with Elasticsearch and Kibana on Amazon EMR
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Hernan Vivani is a Big Data Support Engineer for Amazon Web Services This post shows you how to install Elasticsearch and Kibana on an Amazon EMR cluster and provides a few simple ways to confirm it is working. (Please also […]
Read MoreStrategies for Reducing Your Amazon EMR Costs
UPDATE, MAY 2019: We have updated the Amazon EC2 Spot pricing model as of November, 2017. The new pricing model simplifies purchasing without bidding and with fewer interruptions. Learn more about the updated pricing model. —————————————————— This is a guest post by Prateek Gupta, a lead engineer at BloomReach BloomReach has built a personalized discovery […]
Read MoreNode.js Streaming MapReduce with Amazon EMR
Ian Meyers is a Solutions Architecture Senior Manager with AWS Introduction Node.js is a JavaScript framework for running high performance server-side applications based upon non-blocking I/O and an asynchronous, event-driven processing model. When customers need to process large volumes of complex data, Node.js offers a runtime that natively supports the JSON data structure. Languages such […]
Read MoreGetting HBase Running on Amazon EMR and Connecting it to Amazon Kinesis
Wangechi Doble is an AWS Solutions Architect Introduction Apache HBase is an open-source, column-oriented, distributed NoSQL database that runs on the Apache Hadoop framework. In the AWS Cloud, you can choose to deploy Apache HBase on Amazon Elastic Cloud Compute (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on […]
Read MoreThe Impact of Using Latest-Generation Instances for Your Amazon EMR Job
Nick Corbett is a Big Data Consultant for AWS Professional Services Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently. Amazon EMR uses the popular open source framework Apache Hadoop combined with several other AWS products to do such tasks as web indexing, data […]
Read MoreETL Processing Using AWS Data Pipeline and Amazon Elastic MapReduce
Manjeet Chayel is an AWS Solutions Architect This blog post shows you how to build an ETL workflow that uses AWS Data Pipeline to schedule an Amazon Elastic MapReduce (Amazon EMR) cluster to clean and process web server logs stored in an Amazon Simple Storage Service (Amazon S3) bucket. AWS Data Pipeline is an ETL […]
Read MoreVisualizing Real-time, Geotagged Data with Amazon Kinesis
Nick Corbett is a Big Data Consultant for AWS Professional Services Amazon Kinesis is a fully managed service for processing real-time data at massive scale. Whether you are building a system that collects data from remote sensors, aggregating log files from multiple servers, or creating the latest Internet of Things (IoT) solution, Amazon Kinesis lets […]
Read MoreImplement a Real-time, Sliding-Window Application Using Amazon Kinesis and Apache Storm
Rahul Bhartia is an AWS Solutions Architect Streams of data are becoming ubiquitous today – clickstreams, log streams, event streams, and more. The need for real-time processing of high-volume data streams is pushing the limits of traditional data processing infrastructures. Building a clickstream monitoring system, for example, where data is in the form of a continuous clickstream rather […]
Read MoreBuilding Multi-AZ or Multi-Region Amazon Redshift Clusters
Erik Swensson is an AWS Solutions Architect. AWS Solutions Architect Patrick Shumate also contributed to this post. This post explores customer options for building multi-region or multi-availability zone (AZ) clusters. By default, Amazon Redshift has excellent tools to back up your cluster via snapshot to Amazon Simple Storage Service (Amazon S3). These snapshots can be […]
Read MoreInstalling Apache Spark on an Amazon EMR Cluster
Jonathan Fritz is a Senior Product Manager for Amazon EMR ———————– Please note – Amazon EMR now officially supports Spark. For more information about Spark on EMR, visit the Spark on Amazon EMR page or read Intent Media’s guest post on the AWS Big Data Blog about Spark on EMR. ——–————— Over the last five […]
Read More