AWS Big Data Blog

AWS Big Data Meetup April 27 in Seattle: Explore the Power of Machine Learning in the Cloud

by Andy Werth | on | Permalink | Comments |  Share

Join and RSVP! AWS Speaker Guy Ernest, business development manager for machine learning services in AWS “No Dr., or How I Learned to Stop Debugging and Love the Robot” In this talk, Guy will dicuss what developers must know to explore the power of machine learning services in the cloud. Using data to build machine […]

Read More

Using CombineInputFormat to Combat Hadoop’s Small Files Problem

James Norvell is a Big Data Cloud Support Engineer for AWS Many Amazon EMR customers have architectures that track events and streams and store data in S3. This frequently leads to many small files. It’s now well known that Hadoop doesn’t deal well with small files. This issue can be amplified when migrating from Hadoop […]

Read More

AWS at Strata+Hadoop 2016: Building a Scalable Architecture on AWS to Process Streaming Data

by Andy Werth | on | Permalink | Comments |  Share

Gone are the days when big data was confined to batch processing.  To remain competitive, companies must be able to analyze real-time data streams in areas such as video streaming, real-time recommendation engines, preventive maintenance, and fraud detection applications. Last month, Siva Raghupathy and Manjeet Chayel presented “Building a scalable architecture for processing streaming data […]

Read More

Exploring Geospatial Intelligence using SparkR on Amazon EMR

Gopal Wunnava is a Senior Consultant with AWS Professional Services The number of data sources that use location, such as smartphones and sensory devices used in IoT (Internet of things), is expanding rapidly. This explosion has increased demand for analyzing spatial data. Geospatial intelligence (GEOINT) allows you to analyze data that has geographical or spatial […]

Read More

Month in Review: March 2016

by Andy Werth | on | Permalink | Comments |  Share

March provided another full slate of big data solutions on the AWS Big Data Blog! Take a look at the summaries below for something that catches your interest and share with anyone who’s interested in big data. Will Spark Power the Data behind Precision Medicine? Spark is already known for being a major player in […]

Read More

Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS

Russell Nash is a Solutions Architect with AWS Have you been looking for a straightforward way to encrypt your Amazon Redshift data loads? Have you wondered how to safely manage the keys and where to perform the encryption? In this post, I will walk through a solution that meets these requirements by showing you how […]

Read More

Will Spark Power the Data behind Precision Medicine?

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services. This post was co-authored by Ujjwal Ratan, a Solutions Architect with Amazon Web Services. ——————————— “And that’s the promise of precision medicine — delivering the right treatments, at the right time, every time to the right person.“ (President Obama, 2015 State […]

Read More

Crunching Statistics at Scale with SparkR on Amazon EMR

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services. This post is co-authored by Gopal Wunnava, a Senior Consultant with AWS Professional Services. SparkR is an R package that allows you to integrate complex statistical analysis with large datasets. In this blog post, we introduce you running R with the […]

Read More

AWS Big Data Meetup March 31 in San Francisco: Intro to SparkR and breakout discussions

by Steve McPherson | on | Permalink | Comments |  Share

Join and RSVP! Guest Speaker: Cory Dolphin from Twitter Learn about how Answers, Fabric’s realtime analytics product, which processes billions of events in realtime, using Twitter’s new stream processing engine, Heron. Cory will explain some of the challenges the team faced while scaling Storm, and how Heron has helped them fly faster. Specifically, Cory will describe how Heron’s […]

Read More

Anomaly Detection Using PySpark, Hive, and Hue on Amazon EMR

Veronika Megler, Ph.D., is a Senior Consultant with AWS Professional Services We are surrounded by more and more sensors – some of which we’re not even consciously aware. As sensors become cheaper and easier to connect, they create an increasing flood of data that’s getting cheaper and easier to store and process. However, sensor readings […]

Read More