AWS Big Data Blog

Building and Running a Recommendation Engine at Any Scale

by Mortar Data | on | Permalink | Comments |  Share

This is a guest post by K Young, co-founder and CEO of Mortar Data. Mortar Data is an AWS advanced technology partner. UPDATE: MortarData has transitioned into Datadog and has wound down the public Mortar service. The tutorial below no longer works. To learn more about building a recommendation engine on AWS, see Building a […]

Read More

Getting HBase Running on Amazon EMR and Connecting it to Amazon Kinesis

Wangechi Doble is an AWS Solutions Architect Introduction Apache HBase is an open-source, column-oriented, distributed NoSQL database that runs on the Apache Hadoop framework. In the AWS Cloud, you can choose to deploy Apache HBase on Amazon Elastic Cloud Compute (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on […]

Read More

The Impact of Using Latest-Generation Instances for Your Amazon EMR Job

Nick Corbett is a Big Data Consultant for AWS Professional Services Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently.  Amazon EMR uses the popular open source framework Apache Hadoop combined with several other AWS products to do such tasks as web indexing, data […]

Read More

ETL Processing Using AWS Data Pipeline and Amazon Elastic MapReduce

Manjeet Chayel is an AWS Solutions Architect This blog post shows you how to build an ETL workflow that uses AWS Data Pipeline to schedule an Amazon Elastic MapReduce (Amazon EMR) cluster to clean and process web server logs stored in an Amazon Simple Storage Service (Amazon S3) bucket. AWS Data Pipeline is an ETL […]

Read More

Visualizing Real-time, Geotagged Data with Amazon Kinesis

Nick Corbett is a Big Data Consultant for AWS Professional Services Amazon Kinesis is a fully managed service for processing real-time data at massive scale.  Whether you are building a system that collects data from remote sensors, aggregating log files from multiple servers, or creating the latest Internet of Things (IoT) solution, Amazon Kinesis lets […]

Read More

Dispatches from re:Invent – Day 4

by Matt Yanchyshyn | on | Permalink | Comments |  Share

Matt Yanchyshyn is a Principal Solutions Architect at AWS I now have a collection of napkins from customer dinners with various AWS technology solutions sketched on them.  This particular napkin is an Amazon DynamoDB schema design for a customer interested in using the new JSON document support to import a bunch of JSON files into […]

Read More

Dispatches from re:Invent – Day 3

by Matt Yanchyshyn | on | Permalink | Comments |  Share

Matt Yanchyshyn is a Principal Solutions Architect at AWS During the keynote on Wednesday we announced Amazon RDS for Aurora, a new high-performance and cost-effective relational database.  I heard from multiple AWS re:Invent attendees that they’re really excited about how AWS is innovating in the data storage space, and from a big data perspective it […]

Read More

Dispatches from re:Invent – Day 2

by Matt Yanchyshyn | on | Permalink | Comments |  Share

Matt Yanchyshyn is a Principal Solutions Architect at AWS Today hundreds of AWS customers participated in bootcamps at re:Invent, including three sessions in the big data space: Store, Manage, and Analyze Big Data in the Cloud, Real Time Data Processing and Analysis with Amazon Redshift and Amazon Kinesis and Building High-Performance Applications on DynamoDB.  Chris […]

Read More

Dispatches from re:Invent – Day 1

by Matt Yanchyshyn | on | Permalink | Comments |  Share

Matt Yanchyshyn is a Principal Solutions Architect at AWS The breakout sesisons at AWS re:Invent start on Wednesday, but plenty of people have already arrived and the product announcements have started. In the big data world, the Amazon DynamoDB team announced a sneak preview of an exciting new feature: DynamoDB Streams (https://aws.amazon.com/blogs/aws/dynamodb-streams-preview/).  This will allow […]

Read More

Is the Big Data Track at re:Invent on Your Schedule?

by Andy Werth | on | Permalink | Comments |  Share

AWS re:Invent is just a few days away! Big data is being used to transform businesses, increase efficiency, and drive innovation. The AWS Cloud has a comprehensive portfolio of big data and HPC services to accelerate and automate how you put data to work in your organization. In the Big Data & HPC track at […]

Read More