AWS Big Data Blog

Supercharge SQL on Your Data in Apache HBase with Apache Phoenix

With today’s launch of Amazon EMR release 4.7, you can now create clusters with Apache Phoenix 4.7.0 for low-latency SQL and OLTP workloads. Phoenix uses Apache HBase as its backing store (HBase 1.2.1 is included on Amazon EMR release 4.7.0), using HBase scan operations and coprocessors for fast performance. Additionally, you can map Phoenix tables […]

Read More

Using Spark SQL for ETL

Ben Snively is a Solutions Architect with AWS With big data, you deal with many different formats and large volumes of data. SQL-style queries have been around for nearly four decades. Many systems support SQL-style syntax on top of the data layers, and the Hadoop/Spark ecosystem is no exception. This allows companies to try new […]

Read More

Real-time in-memory OLTP and Analytics with Apache Ignite on AWS

Babu Elumalai is a Solutions Architect with AWS Organizations are generating tremendous amounts of data, and they increasingly need tools and systems that help them use this data to make decisions. The data has both immediate value (for example, trying to understand how a new promotion is performing in real time) and historic value (trying […]

Read More

From SQL to Microservices: Integrating AWS Lambda with Relational Databases

Bob Strahan is a Senior Consultant with AWS Professional Services AWS Lambda has emerged as excellent compute platform for modern microservices architecture, driving dramatic advancements in flexibility, resilience, scale and cost effectiveness. Many customers can take advantage of this transformational technology from within their existing relational database applications. In this post, we explore how to […]

Read More

Month in Review: April 2016

Lots to see on the Big Data Blog in April! Please take a look at the summaries below for something that catches your interest. Exploring Geospatial Intelligence using SparkR on Amazon EMR The number of data sources that use location, such as smartphones and sensory devices used in IoT (Internet of things), is expanding rapidly. […]

Read More

AWS Big Data Meetup May 5 in Palo Alto: Explore the Power of Machine Learning in the Cloud

Join and RSVP! AWS Speaker Guy Ernest, business development manager for machine learning services in AWS “No Dr., or How I Learned to Stop Debugging and Love the Robot” In this talk, Guy will dicuss what developers must know to explore the power of machine learning services in the cloud. Using data to build machine […]

Read More

Process Encrypted Data in Amazon EMR with Amazon S3 and AWS KMS

Russell Nash is a Solutions Architect with AWS. Amo Abeyaratne, a Big Data consultant with AWS, also contributed to this post. One of the most powerful features of Amazon EMR is the close integration with Amazon S3 through EMRFS. This allows you to take advantage of many S3 features, including support for S3 client-side and […]

Read More

Sharpen your Skill Set with Apache Spark on the AWS Big Data Blog

The AWS Big Data Blog has a large community of authors who are passionate about Apache Spark and who regularly publish content that helps customers use Spark to build real-world solutions. You’ll see content on a variety of topics, including deep-dives on Spark’s internals, building Spark Streaming applications, creating machine learning pipelines using MLlib, and ways […]

Read More

Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR

Ben Snively is a Solutions Architect with AWS Jon Fritz, a Senior Product Manager for Amazon EMR, co-authored this post With today’s launch of Amazon EMR release 4.6, you can now quickly and easily provision a cluster with Apache HBase 1.2. Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is […]

Read More