AWS Storage Blog
Category: Learning Levels
From raw to refined: building a data quality pipeline with AWS Glue and Amazon S3 Tables
Organizations often struggle to extract maximum value from their data lakes when running generative AI and analytics workloads due to data quality challenges. Although data lakes excel at storing massive amounts of raw, diverse data, they need robust governance and management practices to prevent common quality issues. Without proper data validation, cleansing processes, and ongoing […]
Enable item-level search and recovery for Amazon EC2 with AWS Backup
Users often use backups to help recover data after a disaster or security incident. However, what is often overlooked is the need to restore data due to an operational incident such as a data corruption event or deleted file. The ability to identify files and directories within a backup and restore them is an important […]
Automate data transfers and migrations with AWS DataSync and Terraform
In today’s data-driven world, organizations face the challenge of efficiently managing and consolidating vast amounts of information from diverse sources. Whether it’s for analytics, machine learning (ML), or other business-critical applications, the ability to seamlessly transfer and organize data is crucial. However, this process can be complex, time-consuming, and prone to errors when done manually. […]
Best practice configuration of Amazon FSx for NetApp ONTAP for Microsoft SQL Server workloads
When it comes to running enterprise workloads, Microsoft SQL Server sits at the heart of many of these solutions providing the data needed to drive important business decisions. However, what is often not considered is the storage service underpinning the delivery of that critical data, and with the right tools and management applied, how that […]
University of California Irvine backs up petabytes of research data to AWS
Editor’s note: AWS is not responsible for UCI’s public GitHub repo linked in this post, which has been provided so that interested parties can explore the solution described in this post in more detail. The University of California, Irvine (UCI) is a public land-grant research university with troves of research data stored on servers in […]
How to consume tabular data from Amazon S3 Tables for insights and business reporting
When was the last time you found yourself trying to look at rows and rows of data in a spreadsheet struggling to interpret and draw conclusions? Many analysts and engineers experience the same challenge every day. Whether it’s analyzing sales trends, monitoring operational metrics, or understanding customer behavior, the challenge lies not just in interpreting […]
Automating paper-to-electronic healthcare claims processing with AWS
Health plans process billions of claims electronically each year. Council for Affordable Quality Healthcare (CAQH) estimates that approximately 10% of claims still arrive as paper documents, accounting for hundreds of millions of paper submissions annually in the U.S. These paper claims create processing bottlenecks and consume a disproportionate share of operational costs and resources, with […]
Optimizing stateful storage lifecycle on AWS with Kubernetes and Salesforce
Managing storage resources efficiently in cloud environments is a challenge for organizations of all sizes. As businesses scale their operations, they often accumulate unused storage volumes that continue to generate costs without providing value. This ‘orphaned’ storage problem is particularly acute in containerized environments, where the complexity of storage lifecycle management can lead to oversight […]
Running I/O intensive workloads on PostgreSQL with Amazon EBS io2 Block Express
Databases are a fundamental component for any organization with its own IT infrastructure powering various applications. Making sure of the smooth operation of database servers is vital because any performance disruptions can impact numerous users and their activities. Many companies experience performance slowdowns in their applications due to storage latency during database operations. To tackle […]
Using Amazon S3 Express One Zone as a caching layer for S3 Standard
Data caching is a critical strategy for optimizing application performance in today’s data-intensive environments. By storing frequently accessed information in high-speed storage locations, organizations can dramatically reduce access times, optimize the use of compute resources, and improve overall system responsiveness. Effective caching strategies become particularly essential for workloads that require consistent low latency, such as […]