AWS Storage Blog
Category: Amazon EMR
Use generative AI to query your Amazon S3 data lake for insights
Businesses store large volumes of data in their data lakes and rely on this data to extract insights and make important business decisions. However, business stakeholders sometimes lack the technical skills required to run complex queries against their data lakes. Instead, they rely on data scientists or analysts to build reports and dashboards or to […]
How to enforce Amazon S3 Access Grants with Immuta
Amazon Simple Storage Service (Amazon S3) is the most popular object storage platform for modern data lakes. Organizations today evolved to adopt a lake house architecture that combines the scalability and cost effectiveness of data lakes with the performance and ease-of-use of data warehouses. Likewise, Amazon S3 plays an increasingly important role as the foundational […]
Maximizing price performance for big data workloads using Amazon EBS
Since the emergence of big data over a decade ago, Hadoop – an open-source framework that is used to efficiently store and process large datasets – has been crucial in storing, analyzing, and reducing that data to provide value for enterprises. Hadoop lets you store structured, partially structured, or unstructured data of any kind across […]
Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR
UPDATE (7/25/2024): Use Amazon Athena, S3 Object Lambda, or client-side filtering to query your data in Amazon S3. Learn more » Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For example, […]
How TMAP Mobility transferred 2.4 PB of Hadoop data using AWS DataSync
Launched in 2002, TMAP Mobility is Korea’s leading mobility platform, with 20 million registered users and 14 million monthly active users. TMAP provides navigation services based on a wide range of real-time traffic information and data. Previously, the Data Intelligence group at TMAP Mobility operated a mobility-data platform based on a Hadoop Distributed File System […]
CME Group accelerates cloud migration with AWS Storage Gateway
At CME Group, the world’s leading and most diverse derivatives marketplace, we offer futures and options across every investible asset class, from corn to Bitcoin. This breadth means our global, electronic markets are powered by data – and lots of it. Making sure that our customers have access to the market data that they need […]
NEW Amazon S3 sessions at AWS re:Invent are coming on Jan 12-14
UPDATE 9/8/2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. We are into week two of AWS re:Invent, and a lot of the Amazon S3 sessions we posted about are now available on-demand, with a few more to be broadcast over the next two weeks. Hopefully, you also heard about some […]
Connecting AWS Outposts to on-premises data sources
Millions of customers such as startups, enterprises, and leading government agencies are using AWS to lower costs, become more agile, and innovate faster. There are some workloads that must remain on-premises in order to interact with data that cannot, for variety of reasons, move to an AWS Region. Enter AWS Outposts. AWS Outposts is a […]
Migrate HDFS files to an Amazon S3 data lake with AWS Snowball Edge
The need to store newly connected data grows as the sources of data increase. Enterprise customers use Hadoop Distributed File System (HDFS) as their data lake storage repository for on-premises Hadoop applications. Customers are migrating their data lakes to AWS for a more secure, scalable, agile, and cost-effective solution. For HDFS migrations where high-speed transfer […]