AWS Big Data Blog

Category: Amazon Simple Storage Services (S3)

Tips for Migrating to Apache HBase on Amazon S3 from HDFS

Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase on Amazon S3. Running HBase on S3 gives you several added benefits, including lower costs, data durability, and easier scalability. HBase provides several options that you can use to migrate and back up HBase tables. The steps to migrate to HBase on […]

Read More

Securely Analyze Data from Another AWS Account with EMRFS

Sometimes, data to be analyzed is spread across buckets owned by different accounts. In order to ensure data security, appropriate credentials management needs to be in place. This is especially true for large enterprises storing data in different Amazon S3 buckets for different departments. For example, a customer service department may need access to data […]

Read More

Building an Event-Based Analytics Pipeline for Amazon Game Studios’ Breakaway

All software developers strive to build products that are functional, robust, and bug-free, but video game developers have an extra challenge: they must also create a product that entertains. When designing a game, developers must consider how the various elements—such as characters, story, environment, and mechanics—will fit together and, more importantly, how players will interact […]

Read More

Analyzing Data in S3 using Amazon Athena

Neil Mukerje is a Solution Architect for Amazon Web Services Abhishek Sinha is a Senior Product Manager on Amazon Athena Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage and you can […]

Read More

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on Amazon EMR with Amazon S3

John Hitchingham is Director of Performance Engineering at FINRA The Financial Industry Regulatory Authority (FINRA) is a private sector regulator responsible for analyzing 99% of the equities and 65% of the option activity in the US. In order to look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust […]

Read More

How SmartNews Built a Lambda Architecture on AWS to Analyze Customer Behavior and Recommend Content

This is a guest post by Takumi Sakamoto, a software engineer at SmartNews. SmartNews in their own words: “SmartNews is a machine learning-based news discovery app that delivers the very best stories on the Web for more than 18 million users worldwide.” Data processing is one of the key technologies for SmartNews. Every team’s workload […]

Read More

Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS

Russell Nash is a Solutions Architect with AWS Have you been looking for a straightforward way to encrypt your Amazon Redshift data loads? Have you wondered how to safely manage the keys and where to perform the encryption? In this post, I will walk through a solution that meets these requirements by showing you how […]

Read More

Turning Amazon EMR into a Massive Amazon S3 Processing Engine with Campanile

Michael Wallman is a senior consultant with AWS ProServ Have you ever had to copy a huge Amazon S3 bucket to another account or region? Or create a list based on object name or size? How about mapping a function over millions of objects? Amazon EMR to the rescue! EMR allows you to deploy large […]

Read More

Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR

This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]

Read More