AWS Big Data Blog

Meet the Amazon EMR Team this Friday at a Tech Talk & Networking Event in Mountain View

Want to change the world with Big Data and Analytics? Come join us on the Amazon EMR team in Amazon Web Services! Meet the Amazon EMR team this Friday April 7th from 5:00 – 7:30 PM at Michael’s at Shoreline in Mountain View. We’ll feature short tech talks by EMR leadership who will talk about the past, […]

Read More

Encrypt and Decrypt Amazon Kinesis Records Using AWS KMS

Customers with strict compliance or data security requirements often require data to be encrypted at all times, including at rest or in transit within the AWS cloud. This post shows you how to build a real-time streaming application using Kinesis in which your records are encrypted while at rest or in transit. Amazon Kinesis overview […]

Read More

Big Data Resources on the AWS Knowledge Center

The AWS Knowledge Center answers the questions we receive most frequently from AWS customers. It is a resource for you that is distinct from AWS Documentation, the AWS Discussion Forums, and the AWS Support Center. It covers questions from across every AWS service. Specific big data services covered on the Knowledge Center include Amazon EMR, Amazon Athena, […]

Read More

Top 10 Performance Tuning Tips for Amazon Athena

This blog post was last reviewed and updated May 2022, with more details like using EXPLAIN ANALYZE, updated compression, ORDER BY and JOIN tips, using partition indexing, updated stats (with performance improvements), added bonus tips. Amazon Athena is an interactive query service that makes it easy to analyze data stored in Amazon Simple Storage Service […]

Read More

Running R on Amazon Athena

This blog post has been translated into Japanese. Data scientists are often concerned about managing the infrastructure behind big data platforms while running SQL on R. Amazon Athena is an interactive query service that works directly with data stored in S3 and makes it easy to analyze data using standard SQL without the need to […]

Read More

Amazon Redshift Monitoring Now Supports End User Queries and Canaries

Ian Meyers is a Solutions Architecture Senior Manager with AWS The serverless Amazon Redshift Monitoring utility lets you gather important performance metrics from your Redshift cluster’s system tables and persists the results in Amazon CloudWatch. This serverless solution leverages AWS Lambda to schedule custom SQL queries and process the results. With this utility, you can use […]

Read More

Month in Review: February 2017

Another month of big data solutions on the Big Data Blog! Take a look at our summaries below and learn, comment, and share. Thank you for reading! NEW POSTS Implement Serverless Log Analytics Using Amazon Kinesis Analytics In this post, learn how how to implement a solution that analyzes streaming Apache access log data from an […]

Read More

Analyzing VPC Flow Logs using Amazon Athena, and Amazon QuickSight

February 2nd 2022: Blog updated by Chaitanya Shah. Organizations of different size who migrate their applications in cloud or applications born in cloud makes use of various cloud services to innovate and provide better, cutting edge services to their customers. While these applications provide business functionality to customers it needs to transfer data over network […]

Read More