AWS Big Data Blog

Use Apache Oozie Workflows to Automate Apache Spark Jobs (and more!) on Amazon EMR

Mike Grimes is an SDE with Amazon EMR As a developer or data scientist, you rarely want to run a single serial job on an Apache Spark cluster. More often, to gain insight from your data you need to process it in multiple, possibly tiered steps, and then move the data into another format and […]

Read More

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink

Tony Gibbs is a Solutions Architect with AWS (Update: This blog post has been translated into Japanese) When it comes to choosing a SQL-based database in AWS, there are many options. Sometimes it can be difficult to know which one to choose. For example, when would you use Amazon Aurora instead of Amazon RDS PostgreSQL […]

Read More

Supercharge SQL on Your Data in Apache HBase with Apache Phoenix

With today’s launch of Amazon EMR release 4.7, you can now create clusters with Apache Phoenix 4.7.0 for low-latency SQL and OLTP workloads. Phoenix uses Apache HBase as its backing store (HBase 1.2.1 is included on Amazon EMR release 4.7.0), using HBase scan operations and coprocessors for fast performance. Additionally, you can map Phoenix tables […]

Read More

Using Spark SQL for ETL

Ben Snively is a Solutions Architect with AWS With big data, you deal with many different formats and large volumes of data. SQL-style queries have been around for nearly four decades. Many systems support SQL-style syntax on top of the data layers, and the Hadoop/Spark ecosystem is no exception. This allows companies to try new […]

Read More

Using Python 3.4 on EMR Spark Applications

Bruno Faria is a Big Data Support Engineer for Amazon Web Services Many data scientists choose Python when developing on Spark. With last month’s Amazon EMR release 4.6, we’ve made it even easier to use Python: Python 3.4 is installed on your EMR cluster by default. You’ll still find Python 2.6 and 2.7 on your […]

Read More

Real-time in-memory OLTP and Analytics with Apache Ignite on AWS

Babu Elumalai is a Solutions Architect with AWS Organizations are generating tremendous amounts of data, and they increasingly need tools and systems that help them use this data to make decisions. The data has both immediate value (for example, trying to understand how a new promotion is performing in real time) and historic value (trying […]

Read More

From SQL to Microservices: Integrating AWS Lambda with Relational Databases

Bob Strahan is a Senior Consultant with AWS Professional Services AWS Lambda has emerged as excellent compute platform for modern microservices architecture, driving dramatic advancements in flexibility, resilience, scale and cost effectiveness. Many customers can take advantage of this transformational technology from within their existing relational database applications. In this post, we explore how to […]

Read More

Month in Review: April 2016

by Andy Werth | on | Permalink | Comments |  Share

Lots to see on the Big Data Blog in April! Please take a look at the summaries below for something that catches your interest. Exploring Geospatial Intelligence using SparkR on Amazon EMR The number of data sources that use location, such as smartphones and sensory devices used in IoT (Internet of things), is expanding rapidly. […]

Read More

AWS Big Data Meetup May 5 in Palo Alto: Explore the Power of Machine Learning in the Cloud

by Andy Werth | on | Permalink | Comments |  Share

Join and RSVP! AWS Speaker Guy Ernest, business development manager for machine learning services in AWS “No Dr., or How I Learned to Stop Debugging and Love the Robot” In this talk, Guy will dicuss what developers must know to explore the power of machine learning services in the cloud. Using data to build machine […]

Read More

Process Encrypted Data in Amazon EMR with Amazon S3 and AWS KMS

Russell Nash is a Solutions Architect with AWS. Amo Abeyaratne, a Big Data consultant with AWS, also contributed to this post. One of the most powerful features of Amazon EMR is the close integration with Amazon S3 through EMRFS. This allows you to take advantage of many S3 features, including support for S3 client-side and […]

Read More