AWS Big Data Blog

Category: Analytics

Genomic Analysis with Hail on Amazon EMR and Amazon Athena

For this task, we use Hail, an open source framework for exploring and analyzing genomic data that uses the Apache Spark framework. In this post, we use Amazon EMR to run Hail. We walk through the setup, configuration, and data processing. Finally, we generate an Apache Parquet–formatted variant dataset and explore it using Amazon Athena.

Read More

Collect Data Statistics Up to 5x Faster by Analyzing Only Predicate Columns with Amazon Redshift

After loading new data into an Amazon Redshift cluster, statistics need to be re-computed to guarantee performant query plans. By learning which column statistics are actually being used by the customer’s workload and collecting statistics only on those columns, Amazon Redshift is able to significantly reduce the amount of time needed for table maintenance during data loading workflows.

Read More

Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production

This is a guest post by Rafi Ton, founder and CEO of NUVIAD. The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time. The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.

Read More

Build a Data Lake Foundation with AWS Glue and Amazon S3

A data lake is an increasingly popular way to store and analyze data that addresses the challenges of dealing with massive volumes of heterogeneous data. A data lake allows organizations to store all their data—structured and unstructured—in one centralized repository. Because data can be stored as-is, there is no need to convert it to a predefined schema. This post walks you through the process of using AWS Glue to crawl your data on Amazon S3 and build a metadata store that can be used with other AWS offerings.

Read More

Amazon QuickSight Adds Support for Combo Charts and Row-Level Security

We are excited to announce support for two new features in Amazon QuickSight: 1) Combo charts, the first visual type in QuickSight to support dual-axis visualization, and 2) Row-Level Security, which allows access control over data at the row level based on the user who is accessing QuickSight. Together, these features enable you to present more engaging and personalized dashboards in Amazon QuickSight, while enforcing stricter controls over data.

Read More

Preprocessing Data in Amazon Kinesis Analytics with AWS Lambda

Kinesis Analytics now gives you the option to preprocess your data with AWS Lambda. This gives you a great deal of flexibility in defining what data gets analyzed by your Kinesis Analytics application. In this post, I discuss some common use cases for preprocessing, and walk you through an example to help highlight its applicability.

Read More