AWS Big Data Blog

Month in Review: February 2017

Another month of big data solutions on the Big Data Blog!

Take a look at our summaries below and learn, comment, and share. Thank you for reading!


Implement Serverless Log Analytics Using Amazon Kinesis Analytics
In this post, learn how how to implement a solution that analyzes streaming Apache access log data from an EC2 instance aggregated over 5 minutes.

Migrate External Table Definitions from a Hive Metastore to Amazon Athena
For customers who use Hive external tables on Amazon EMR, or any flavor of Hadoop, a key challenge is how to effectively migrate an existing Hive metastore to Amazon Athena, an interactive query service that directly analyzes data stored in Amazon S3. In this post, learn an approach to migrate an existing Hive metastore to Athena, as well as how to use the Athena JDBC driver to run scripts.

AWS Big Data is Coming to HIMSS!
This year’s HIMSS conference was held at the Orange County Convention Center in Orlando, Florida from February 20 – 23. This blog post lists past AWS Big Data Blog posts to show how AWS technologies are being used to improve healthcare.

Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe
In this post, you will use the tightly coupled integration of Amazon Kinesis Firehose for log delivery, Amazon S3 for log storage, and Amazon Athena with JSONSerDe to run SQL queries against these logs without the need for data transformation or insertion into a database.

Scheduled Refresh for SPICE Data Sets on Amazon QuickSight
QuickSight uses SPICE (Super-fast, Parallel, In-Memory Calculation Engine), a fully managed data store that enables blazing fast visualizations and can ingest data from AWS, on-premises, and cloud sources. Data in SPICE could be refreshed at any time with the click of a button within QuickSight. This post announced the ability to schedule these refreshes!

Harmonize, Search, and Analyze Loosely Coupled Datasets on AWS
You have come up with an exciting hypothesis, and now you are keen to find and analyze as much data as possible to prove (or refute) it. There are many datasets that might be applicable, but they have been created at different times by different people and don’t conform to any common standard. In this blog post, we will describe a sample application that illustrates how to solve these problems. You can install our sample app, which will harmonize and index three disparate datasets to make them searchable, present a data-driven, customizable UI for searching the datasets to do preliminary analysis and to locate relevant datasets, and integrate with Amazon Athena and Amazon QuickSight for custom analysis and visualization.


Building Event-Driven Batch Analytics on AWS
Modern businesses typically collect data from internal and external sources at various frequencies throughout the day. In this post, you learn an elastic and modular approach for how to collect, process, and analyze data for event-driven applications in AWS.

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.