AWS Big Data Blog

Category: Amazon EMR*

Securely Access Web Interfaces on Amazon EMR Launched in a Private Subnet

Ben Snively is a Solutions Architect with AWS Private subnets allow you to limit access to deployed components, and to control security and routing of the system. You can also use a private subnet to connect an on-premises local network to AWS through a VPN or AWS Direct Connect.  Amazon EMR allows customers to launch […]

Read More

Building a Recommendation Engine with Spark ML on Amazon EMR using Zeppelin

Guy Ernest is a Solutions Architect with AWS Many developers want to implement the famous Amazon model that was used to power the “People who bought this also bought these items” feature on Amazon.com. This model is based on a method called Collaborative Filtering. It takes items such as movies, books, and products that were […]

Read More

Analyze Data with Presto and Airpal on Amazon EMR

Songzhi Liu is a Professional Services Consultant with AWS You can now launch Presto version 0.119 on Amazon EMR, allowing you to easily spin up a managed EMR cluster with the Presto query engine and run interactive analysis on data stored in Amazon S3. You can integrate with Spot instances, publish logs to an S3 […]

Read More

Using BlueTalon with Amazon EMR

This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. Amazon Elastic MapReduce (Amazon EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]

Read More

Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR

This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]

Read More

How Expedia Implemented Near Real-time Analysis of Interdependent Datasets

This is a guest post by Stephen Verstraete, a manager at Pariveda Solutions. Pariveda Solutions is an AWS Premier Consulting Partner. Common patterns exist for batch processing and real-time processing of Big Data. However, we haven’t seen patterns that allow us to process batches of dependent data in real-time. Expedia’s marketing group needed to analyze […]

Read More

Large-Scale Machine Learning with Spark on Amazon EMR

This is a guest post by Jeff Smith, Data Engineer at Intent Media. Intent Media, in their own words: “Intent Media operates a platform for advertising on commerce sites.  We help online travel companies optimize revenue on their websites and apps through sophisticated data science capabilities. On the data team at Intent Media, we are […]

Read More

Test drive two big data scenarios from the ‘Building a Big Data Platform on AWS’ bootcamp

Matt Yanchyshyn is a Sr. Manager for AWS Solutions Architecture AWS offers a number of events during the year such as our annual AWS re:Invent conference, the AWS Summit series, the AWS Pop-up Loft, and a variety of roadshows such as the upcoming AWS Big Data Solutions Days. All of these provide opportunities for AWS […]

Read More

Indexing Common Crawl Metadata on Amazon EMR Using Cascading and Elasticsearch

Hernan Vivani is a Big Data Support Engineer for Amazon Web Services A previous post showed you how to get started with Elasticsearch and Kibana on Amazon EMR. In that post, we installed Elasticsearch and Kibana on an Amazon EMR cluster using bootstrap actions. This post shows you how to build a simple application with […]

Read More

Launching and Running an Amazon EMR Cluster in your VPC – Part 2: Custom DNS

Daniel Garrison is a Big Data Support Engineer for Amazon Web Services In Part 1 you learned how Amazon EMR uses Amazon VPC DNS hostname and DHCP settings to satisfy the Hadoop requirements. Because it’s common to change the domain name setting in your DHCP options set to a custom internal domain name, this post […]

Read More