AWS Big Data Blog

Category: Compute

Automating Analytic Workflows on AWS

Wangechi Doble is a Solutions Architect with AWS Organizations are experiencing a proliferation of data. This data includes logs, sensor data, social media data, and transactional data, and resides in the cloud, on premises, or as high-volume, real-time data feeds. It is increasingly important to analyze this data: stakeholders want information that is timely, accurate, […]

Building and Maintaining an Amazon S3 Metadata Index without Servers

Mike Deck is a Solutions Architect with AWS Amazon S3 is a simple key-based object store whose scalability and low cost make it ideal for storing large datasets. Its design enables S3 to provide excellent performance for storing and retrieving objects based on a known key. Finding objects based on other attributes, however, requires doing […]

Building Scalable and Responsive Big Data Interfaces with AWS Lambda

This is a guest post by Martin Holste, a co-founder of the Threat Analytics Platform at FireEye where he is a senior researcher specializing in prototypes. Overview At FireEye, Inc., we process billions of security events every day with our Threat Analytics Platform, running on AWS. In building our platform, one of the problems we […]

How Expedia Implemented Near Real-time Analysis of Interdependent Datasets

This is a guest post by Stephen Verstraete, a manager at Pariveda Solutions. Pariveda Solutions is an AWS Premier Consulting Partner. Common patterns exist for batch processing and real-time processing of Big Data. However, we haven’t seen patterns that allow us to process batches of dependent data in real-time. Expedia’s marketing group needed to analyze […]

A Zero-Administration Amazon Redshift Database Loader

Ian Meyers is a Solutions Architecture Senior Manager with AWS With this new AWS Lambda function, it’s never been easier to get file data into Amazon Redshift. You simply push files into a variety of locations on Amazon S3 and have them automatically loaded into your Amazon Redshift clusters. Using AWS Lambda with Amazon Redshift […]

Hosting Amazon Kinesis Applications on AWS Elastic Beanstalk

Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon Kinesis provides a scalable and highly available platform for ingesting data from thousands of clients. Once data is available on a Kinesis stream, you can build applications to process the data using the Kinesis Client Library (KCL). KCL provides a framework for managing many […]