AWS Big Data Blog
Category: AWS Big Data
AWS Big Data Meetup May 5 in Palo Alto: Explore the Power of Machine Learning in the Cloud
Join and RSVP! AWS Speaker Guy Ernest, business development manager for machine learning services in AWS “No Dr., or How I Learned to Stop Debugging and Love the Robot” In this talk, Guy will dicuss what developers must know to explore the power of machine learning services in the cloud. Using data to build machine […]
Process Encrypted Data in Amazon EMR with Amazon S3 and AWS KMS
Russell Nash is a Solutions Architect with AWS. Amo Abeyaratne, a Big Data consultant with AWS, also contributed to this post. One of the most powerful features of Amazon EMR is the close integration with Amazon S3 through EMRFS. This allows you to take advantage of many S3 features, including support for S3 client-side and […]
Sharpen your Skill Set with Apache Spark on the AWS Big Data Blog
The AWS Big Data Blog has a large community of authors who are passionate about Apache Spark and who regularly publish content that helps customers use Spark to build real-world solutions. You’ll see content on a variety of topics, including deep-dives on Spark’s internals, building Spark Streaming applications, creating machine learning pipelines using MLlib, and ways […]
Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR
Ben Snively is a Solutions Architect with AWS Jon Fritz, a Senior Product Manager for Amazon EMR, co-authored this post With today’s launch of Amazon EMR release 4.6, you can now quickly and easily provision a cluster with Apache HBase 1.2. Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is […]
AWS Big Data Meetup April 27 in Seattle: Explore the Power of Machine Learning in the Cloud
Join and RSVP! AWS Speaker Guy Ernest, business development manager for machine learning services in AWS “No Dr., or How I Learned to Stop Debugging and Love the Robot” In this talk, Guy will dicuss what developers must know to explore the power of machine learning services in the cloud. Using data to build machine […]
Using CombineInputFormat to Combat Hadoop’s Small Files Problem
James Norvell is a Big Data Cloud Support Engineer for AWS Many Amazon EMR customers have architectures that track events and streams and store data in S3. This frequently leads to many small files. It’s now well known that Hadoop doesn’t deal well with small files. This issue can be amplified when migrating from Hadoop […]
AWS at Strata+Hadoop 2016: Building a Scalable Architecture on AWS to Process Streaming Data
Gone are the days when big data was confined to batch processing. To remain competitive, companies must be able to analyze real-time data streams in areas such as video streaming, real-time recommendation engines, preventive maintenance, and fraud detection applications. Last month, Siva Raghupathy and Manjeet Chayel presented “Building a scalable architecture for processing streaming data […]
Exploring Geospatial Intelligence using SparkR on Amazon EMR
Gopal Wunnava is a Senior Consultant with AWS Professional Services The number of data sources that use location, such as smartphones and sensory devices used in IoT (Internet of things), is expanding rapidly. This explosion has increased demand for analyzing spatial data. Geospatial intelligence (GEOINT) allows you to analyze data that has geographical or spatial […]
Month in Review: March 2016
March provided another full slate of big data solutions on the AWS Big Data Blog! Take a look at the summaries below for something that catches your interest and share with anyone who’s interested in big data. Will Spark Power the Data behind Precision Medicine? Spark is already known for being a major player in […]
Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS
Russell Nash is a Solutions Architect with AWS Have you been looking for a straightforward way to encrypt your Amazon Redshift data loads? Have you wondered how to safely manage the keys and where to perform the encryption? In this post, I will walk through a solution that meets these requirements by showing you how […]