AWS Big Data Blog

Tag: S3

How SmartNews Built a Lambda Architecture on AWS to Analyze Customer Behavior and Recommend Content

This is a guest post by Takumi Sakamoto, a software engineer at SmartNews. SmartNews in their own words: “SmartNews is a machine learning-based news discovery app that delivers the very best stories on the Web for more than 18 million users worldwide.” Data processing is one of the key technologies for SmartNews. Every team’s workload […]

Read More

Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS

Russell Nash is a Solutions Architect with AWS Have you been looking for a straightforward way to encrypt your Amazon Redshift data loads? Have you wondered how to safely manage the keys and where to perform the encryption? In this post, I will walk through a solution that meets these requirements by showing you how […]

Read More

Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR

This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]

Read More

Building and Maintaining an Amazon S3 Metadata Index without Servers

Mike Deck is a Solutions Architect with AWS Amazon S3 is a simple key-based object store whose scalability and low cost make it ideal for storing large datasets. Its design enables S3 to provide excellent performance for storing and retrieving objects based on a known key. Finding objects based on other attributes, however, requires doing […]

Read More

Building Scalable and Responsive Big Data Interfaces with AWS Lambda

This is a guest post by Martin Holste, a co-founder of the Threat Analytics Platform at FireEye where he is a senior researcher specializing in prototypes. Overview At FireEye, Inc., we process billions of security events every day with our Threat Analytics Platform, running on AWS. In building our platform, one of the problems we […]

Read More

How Expedia Implemented Near Real-time Analysis of Interdependent Datasets

This is a guest post by Stephen Verstraete, a manager at Pariveda Solutions. Pariveda Solutions is an AWS Premier Consulting Partner. Common patterns exist for batch processing and real-time processing of Big Data. However, we haven’t seen patterns that allow us to process batches of dependent data in real-time. Expedia’s marketing group needed to analyze […]

Read More

Applying Machine Learning to Text Mining with Amazon S3 and RapidMiner

Gopal Wunnava is a Senior Consultant with AWS Professional Services By some estimates, 80% of an organization’s data is unstructured content. This content includes web pages, call center transcripts, surveys, feedback forms, legal documents, forums, social media, and blog articles. Therefore, organizations must analyze not just transactional information but also textual content to gain insight […]

Read More

Nasdaq’s Architecture using Amazon EMR and Amazon S3 for Ad Hoc Access to a Massive Data Set

This is a guest post by Nate Sammons, a Principal Architect for Nasdaq The Nasdaq group of companies operates financial exchanges around the world and processes large volumes of data every day. We run a wide variety of analytic and surveillance systems, all of which require access to essentially the same data sets. The Nasdaq […]

Read More

Using AWS for Multi-instance, Multi-part Uploads

James Saull is a Principal Solutions Architect with AWS There are many advantages to using multi-part, multi-instance uploads for large files. First, the throughput is improved because you can upload parts in parallel. Amazon Simple Storage Service (Amazon S3) can store files up to 5TB, yet a single machine with a 1Gbps interface would take […]

Read More