AWS Big Data Blog

Category: Analytics*

Turbocharge your Apache Hive Queries on Amazon EMR using LLAP

Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop cluster using SQL. Data analysts and scientists use Hive to query, summarize, explore, and analyze big data. With the introduction of Hive LLAP (Low Latency Analytical Processing), the notion of Hive being just a batch processing tool has […]

Read More

Amazon QuickSight Now Supports Amazon Athena in EU (Ireland), Count Distinct, and Week Aggregation

Today, I’m excited to share a couple of new features in Amazon QuickSight. First, with this release, we expanded connectivity options by adding Amazon Athena support in the EU (Ireland) Region. Additionally, you can now use Count Distinct on your dimensions and metrics in the visualizations and aggregate date fields by week for SPICE data […]

Read More

AWS CloudFormation Supports Amazon Kinesis Analytics Applications

by Ryan Nienhuis, Chris Marshall, and Nehal Mehta | on | in Amazon Kinesis* | Permalink | Comments |  Share

You can now provision and manage resources for Amazon Kinesis Analytics applications using AWS CloudFormation.  Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL, without having to learn new programming languages or processing frameworks. Kinesis Analytics enables you to query streaming data or build entire streaming applications using […]

Read More

Run Common Data Science Packages on Anaconda and Oozie with Amazon EMR

In the world of data science, users must often sacrifice cluster set-up time to allow for complex usability scenarios. Amazon EMR allows data scientists to spin up complex cluster configurations easily, and to be up and running with complex queries in a matter of minutes. Data scientists often use scheduling applications such as Oozie to […]

Read More

Setting up Read Replica Clusters with HBase on Amazon S3

Many customers have taken advantage of the numerous benefits of running Apache HBase on Amazon S3 for data storage, including lower costs, data durability, and easier scalability. Customers such as FINRA have lowered their costs by 60% by moving to an HBase on S3 architecture along with the numerous operational benefits that come with decoupling […]

Read More

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with […]

Read More

Perform Near Real-time Analytics on Streaming Data with Amazon Kinesis and Amazon Elasticsearch Service

Nowadays, streaming data is seen and used everywhere—from social networks, to mobile and web applications, IoT devices, instrumentation in data centers, and many other sources. As the speed and volume of this type of data increases, the need to perform data analysis in real time with machine learning algorithms and extract a deeper understanding from […]

Read More

Visualize Amazon S3 Analytics Data with Amazon QuickSight

When Amazon S3 analytics was released in November 2016, it gave you the ability to analyze storage access patterns and transition the right data to the right storage class. You could also manually export the data to an S3 bucket to analyze, using the business intelligence tool of your choice, and gather deeper insights on […]

Read More

Under the Hood of Server-Side Encryption for Amazon Kinesis Streams

Customers are using Amazon Kinesis Streams to ingest, process, and deliver data in real time from millions of devices or applications. Use cases for Kinesis Streams vary, but a few common ones include IoT data ingestion and analytics, log processing, clickstream analytics, and enterprise data bus architectures. Within milliseconds of data arrival, applications (KCL, Apache […]

Read More

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what […]

Read More