AWS Big Data Blog

Tag: Amazon S3

Visualize Amazon S3 Analytics Data with Amazon QuickSight

When Amazon S3 analytics was released in November 2016, it gave you the ability to analyze storage access patterns and transition the right data to the right storage class. You could also manually export the data to an S3 bucket to analyze, using the business intelligence tool of your choice, and gather deeper insights on usage and growth patterns. This helped you reduce storage costs while optimizing performance based on usage patterns. With today’s update, you can quickly and easily gain those deeper insights and benefits by analyzing and visualizing S3 analytics data in Amazon QuickSight. It takes just a single click from the S3 console, without the need for manual exports or additional data preparation.

Read More

Seven Tips for Using S3DistCp on Amazon EMR to Move Data Efficiently Between HDFS and Amazon S3

Although it’s common for Amazon EMR customers to process data directly in Amazon S3, there are occasions where you might want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster. Additionally, you might have a use case that requires moving large amounts of data between buckets or regions. In these use cases, large datasets are too big for a simple copy operation.

Read More

Tips for Migrating to Apache HBase on Amazon S3 from HDFS

Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase on Amazon S3. Running HBase on S3 gives you several added benefits, including lower costs, data durability, and easier scalability. HBase provides several options that you can use to migrate and back up HBase tables. The steps to migrate to HBase on […]

Read More

Securely Analyze Data from Another AWS Account with EMRFS

Sometimes, data to be analyzed is spread across buckets owned by different accounts. In order to ensure data security, appropriate credentials management needs to be in place. This is especially true for large enterprises storing data in different Amazon S3 buckets for different departments. For example, a customer service department may need access to data […]

Read More

Building an Event-Based Analytics Pipeline for Amazon Game Studios’ Breakaway

All software developers strive to build products that are functional, robust, and bug-free, but video game developers have an extra challenge: they must also create a product that entertains. When designing a game, developers must consider how the various elements—such as characters, story, environment, and mechanics—will fit together and, more importantly, how players will interact […]

Read More

Analyzing Data in S3 using Amazon Athena

Neil Mukerje is a Solution Architect for Amazon Web Services Abhishek Sinha is a Senior Product Manager on Amazon Athena Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage and you can […]

Read More

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on Amazon EMR with Amazon S3

John Hitchingham is Director of Performance Engineering at FINRA The Financial Industry Regulatory Authority (FINRA) is a private sector regulator responsible for analyzing 99% of the equities and 65% of the option activity in the US. In order to look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust […]

Read More

Turning Amazon EMR into a Massive Amazon S3 Processing Engine with Campanile

Michael Wallman is a senior consultant with AWS ProServ Have you ever had to copy a huge Amazon S3 bucket to another account or region? Or create a list based on object name or size? How about mapping a function over millions of objects? Amazon EMR to the rescue! EMR allows you to deploy large […]

Read More