data lake | AWS Storage Blog

Use Amazon FSx for Lustre to share Amazon S3 data across accounts

Update 4/9/2025: The cross-account bucket policy in the blog has been updated. It was missing a required principal: “arn:aws:iam::accountID:role/AWS-Signed-In-Console-Role.” This omission causes an access denied error. As enterprises evolve their cloud governance practices, multiple teams working in separate accounts may need to share data. One team may oversee an enterprise data lake in one account, […]

Best practices for data lake protection with AWS Backup

Data lakes, powered by Amazon Simple Storage Service (Amazon S3), provide organizations with the availability, agility, and flexibility required for modern analytics approaches to gain deeper insights. Protecting sensitive or business-critical information stored in these S3 buckets is a high priority for organizations. AWS Backup for Amazon S3 makes it easier to centrally automate the […]

Using AWS Storage Gateway to modernize next-generation sequencing workflows

Exact Sciences operates the laboratories across the world that produce data that is critical to performing analysis and diagnostics to classify cancer modalities, treatments, and therapeutics. The laboratories generate large data sets from on-premises genomic sequencing devices that must be sent to the cloud for processing. Once in the cloud, we process the data to […]

How Arc XP lowered data transfer costs by $500k per year with Amazon CloudFront and Lambda@Edge on AWS

The Washington Post, an American daily newspaper company, delivers digital news content using Arc XP’s digital experience platform. Arc XP originated in The Post and has grown into a Software-as-a-Service (SaaS) business used by publishers, broadcasters, and brands to create, host, and monetize engaging content for over 1,500 websites globally. Photo Center is an Arc […]

Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR

UPDATE (7/25/2024): Use Amazon Athena, S3 Object Lambda, or client-side filtering to query your data in Amazon S3. Learn more » Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For example, […]

Online Tech Talk March 17: Migrate your on-premises data lake to a modern data lake on Amazon S3

Don’t miss our AWS Online Storage Tech Talk on March 17, where an AWS expert covers how you can use Amazon S3 to to build a modern data lake. This Tech Talk is at 9:00 AM – 10:00 AM PT (12:00 PM – 1:00 PM ET). Companies count on their data and analytics platforms as […]

AWS re:Invent recap: Break down data silos with a data lake on Amazon S3

UPDATE 9/8/2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. When you have datasets in different places controlled by different groups, you are dealing with data silos, which inherently obscure data. In contrast, a data lake can serve as your central repository of data regardless of source or format. At re:Invent […]

NEW Amazon S3 sessions at AWS re:Invent are coming on Jan 12-14

UPDATE 9/8/2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. We are into week two of AWS re:Invent, and a lot of the Amazon S3 sessions we posted about are now available on-demand, with a few more to be broadcast over the next two weeks. Hopefully, you also heard about some […]

How Bristol Myers Squibb uses Amazon S3 and AWS Storage Gateway to manage scientific data

Bristol Myers Squibb develops and discovers innovative medicines to help treat, manage, and cure serious diseases. We use many AWS services to help us manage our scientific data, lab workflows, and large computations for analyzing molecular, cellular, and clinical datasets. Genomics and clinical data, generated in Bristol Myers Squibb labs, is growing at an exponential […]

Migrate HDFS files to an Amazon S3 data lake with AWS Snowball Edge

The need to store newly connected data grows as the sources of data increase. Enterprise customers use Hadoop Distributed File System (HDFS) as their data lake storage repository for on-premises Hadoop applications. Customers are migrating their data lakes to AWS for a more secure, scalable, agile, and cost-effective solution. For HDFS migrations where high-speed transfer […]

AWS Storage Blog

Tag: data lake