AWS Storage Blog

Tag: Amazon EMR

Amazon S3 Tables

Build a managed transactional data lake with Amazon S3 Tables

UPDATE (12/19/2024): Added guidance for Amazon EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]

Amazon S3 featured image 2023

How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3

In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]

Amazon S3 featured image 2023

Use generative AI to query your Amazon S3 data lake for insights

Businesses store large volumes of data in their data lakes and rely on this data to extract insights and make important business decisions. However, business stakeholders sometimes lack the technical skills required to run complex queries against their data lakes. Instead, they rely on data scientists or analysts to build reports and dashboards or to […]

Maximizing price performance for big data workloads using Amazon EBS

Since the emergence of big data over a decade ago, Hadoop ­– an open-source framework that is used to efficiently store and process large datasets – has been crucial in storing, analyzing, and reducing that data to provide value for enterprises. Hadoop lets you store structured, partially structured, or unstructured data of any kind across […]

Amazon S3 featured image - new

Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR

UPDATE (7/25/2024): Use Amazon Athena, S3 Object Lambda, or client-side filtering to query your data in Amazon S3. Learn more » Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For example, […]

AWS DataSync Featured Image 2020

How TMAP Mobility transferred 2.4 PB of Hadoop data using AWS DataSync

Launched in 2002, TMAP Mobility is Korea’s leading mobility platform, with 20 million registered users and 14 million monthly active users. TMAP provides navigation services based on a wide range of real-time traffic information and data. Previously, the Data Intelligence group at TMAP Mobility operated a mobility-data platform based on a Hadoop Distributed File System […]

AWS Storage Gateway Featured Image

CME Group accelerates cloud migration with AWS Storage Gateway

At CME Group, the world’s leading and most diverse derivatives marketplace, we offer futures and options across every investible asset class, from corn to Bitcoin. This breadth means our global, electronic markets are powered by data – and lots of it. Making sure that our customers have access to the market data that they need […]

AWS Outposts Featured Image

Connecting AWS Outposts to on-premises data sources

Millions of customers such as startups, enterprises, and leading government agencies are using AWS to lower costs, become more agile, and innovate faster. There are some workloads that must remain on-premises in order to interact with data that cannot, for variety of reasons, move to an AWS Region. Enter AWS Outposts. AWS Outposts is a […]

Migrate HDFS files to an Amazon S3 data lake with AWS Snowball Edge

The need to store newly connected data grows as the sources of data increase. Enterprise customers use Hadoop Distributed File System (HDFS) as their data lake storage repository for on-premises Hadoop applications. Customers are migrating their data lakes to AWS for a more secure, scalable, agile, and cost-effective solution. For HDFS migrations where high-speed transfer […]