AWS Big Data Blog
Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR
Ben Snively is a Solutions Architect with AWS Jon Fritz, a Senior Product Manager for Amazon EMR, co-authored this post With today’s launch of Amazon EMR release 4.6, you can now quickly and easily provision a cluster with Apache HBase 1.2. Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is […]
AWS Big Data Meetup April 27 in Seattle: Explore the Power of Machine Learning in the Cloud
Join and RSVP! AWS Speaker Guy Ernest, business development manager for machine learning services in AWS “No Dr., or How I Learned to Stop Debugging and Love the Robot” In this talk, Guy will dicuss what developers must know to explore the power of machine learning services in the cloud. Using data to build machine […]
Using CombineInputFormat to Combat Hadoop’s Small Files Problem
James Norvell is a Big Data Cloud Support Engineer for AWS Many Amazon EMR customers have architectures that track events and streams and store data in S3. This frequently leads to many small files. It’s now well known that Hadoop doesn’t deal well with small files. This issue can be amplified when migrating from Hadoop […]
AWS at Strata+Hadoop 2016: Building a Scalable Architecture on AWS to Process Streaming Data
Gone are the days when big data was confined to batch processing. To remain competitive, companies must be able to analyze real-time data streams in areas such as video streaming, real-time recommendation engines, preventive maintenance, and fraud detection applications. Last month, Siva Raghupathy and Manjeet Chayel presented “Building a scalable architecture for processing streaming data […]
Exploring Geospatial Intelligence using SparkR on Amazon EMR
Gopal Wunnava is a Senior Consultant with AWS Professional Services The number of data sources that use location, such as smartphones and sensory devices used in IoT (Internet of things), is expanding rapidly. This explosion has increased demand for analyzing spatial data. Geospatial intelligence (GEOINT) allows you to analyze data that has geographical or spatial […]
Month in Review: March 2016
March provided another full slate of big data solutions on the AWS Big Data Blog! Take a look at the summaries below for something that catches your interest and share with anyone who’s interested in big data. Will Spark Power the Data behind Precision Medicine? Spark is already known for being a major player in […]
Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS
Russell Nash is a Solutions Architect with AWS Have you been looking for a straightforward way to encrypt your Amazon Redshift data loads? Have you wondered how to safely manage the keys and where to perform the encryption? In this post, I will walk through a solution that meets these requirements by showing you how […]
Will Spark Power the Data behind Precision Medicine?
Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services. This post was co-authored by Ujjwal Ratan, a Solutions Architect with Amazon Web Services. ——————————— “And that’s the promise of precision medicine — delivering the right treatments, at the right time, every time to the right person.“ (President Obama, 2015 State […]
Crunching Statistics at Scale with SparkR on Amazon EMR
Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services. This post is co-authored by Gopal Wunnava, a Senior Consultant with AWS Professional Services. SparkR is an R package that allows you to integrate complex statistical analysis with large datasets. In this blog post, we introduce you running R with the […]
AWS Big Data Meetup March 31 in San Francisco: Intro to SparkR and breakout discussions
Join and RSVP! Guest Speaker: Cory Dolphin from Twitter Learn about how Answers, Fabric’s realtime analytics product, which processes billions of events in realtime, using Twitter’s new stream processing engine, Heron. Cory will explain some of the challenges the team faced while scaling Storm, and how Heron has helped them fly faster. Specifically, Cory will describe how Heron’s […]
