AWS Big Data Blog

Category: Amazon Simple Storage Services (S3)

Building and Maintaining an Amazon S3 Metadata Index without Servers

Mike Deck is a Solutions Architect with AWS Amazon S3 is a simple key-based object store whose scalability and low cost make it ideal for storing large datasets. Its design enables S3 to provide excellent performance for storing and retrieving objects based on a known key. Finding objects based on other attributes, however, requires doing […]

Read More

Building Scalable and Responsive Big Data Interfaces with AWS Lambda

This is a guest post by Martin Holste, a co-founder of the Threat Analytics Platform at FireEye where he is a senior researcher specializing in prototypes. Overview At FireEye, Inc., we process billions of security events every day with our Threat Analytics Platform, running on AWS. In building our platform, one of the problems we […]

Read More

How Expedia Implemented Near Real-time Analysis of Interdependent Datasets

This is a guest post by Stephen Verstraete, a manager at Pariveda Solutions. Pariveda Solutions is an AWS Premier Consulting Partner. Common patterns exist for batch processing and real-time processing of Big Data. However, we haven’t seen patterns that allow us to process batches of dependent data in real-time. Expedia’s marketing group needed to analyze […]

Read More

Applying Machine Learning to Text Mining with Amazon S3 and RapidMiner

Gopal Wunnava is a Senior Consultant with AWS Professional Services By some estimates, 80% of an organization’s data is unstructured content. This content includes web pages, call center transcripts, surveys, feedback forms, legal documents, forums, social media, and blog articles. Therefore, organizations must analyze not just transactional information but also textual content to gain insight […]

Read More

Nasdaq’s Architecture using Amazon EMR and Amazon S3 for Ad Hoc Access to a Massive Data Set

This is a guest post by Nate Sammons, a Principal Architect for Nasdaq The Nasdaq group of companies operates financial exchanges around the world and processes large volumes of data every day. We run a wide variety of analytic and surveillance systems, all of which require access to essentially the same data sets. The Nasdaq […]

Read More

Using AWS for Multi-instance, Multi-part Uploads

James Saull is a Principal Solutions Architect with AWS There are many advantages to using multi-part, multi-instance uploads for large files. First, the throughput is improved because you can upload parts in parallel. Amazon Simple Storage Service (Amazon S3) can store files up to 5TB, yet a single machine with a 1Gbps interface would take […]

Read More

Moving Big Data into the Cloud with Tsunami UDP

Matt Yanchyshyn is a Principal Solutions Architect with Amazon Web Services AWS Solutions Architect Leo Zhadanovsky also contributed to this post. Introduction One of the biggest challenges facing companies that want to leverage the scale and elasticity of AWS for analytics is how to move their data into the cloud. It’s increasingly common to have […]

Read More