AWS Storage Blog
Category: *Post Types
Machine Learning with Kubeflow on Amazon EKS with Amazon EFS
Training Machine Learning models involves multiple steps, it gets more complex and time consuming when the size of the data set for training is in the range of 100s of GBs. Data Scientists run through large number of experiments and research which includes testing and training large number of models. Kubeflow provides various ML capabilities […]
Persistent storage for Kubernetes
Stateful applications rely on data being persisted and retrieved to run properly. When running stateful applications using Kubernetes, state needs to be persisted regardless of container, pod, or node crashes or terminations. This requires persistent storage, that is, storage that lives beyond the lifetime of the container, pod, or node. In this blog, we cover […]
Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR
Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For example, data scientists, data analysts, and data engineers running queries from open source frameworks like Trino want to accelerate access to their […]
Restoring archived objects at scale from the Amazon S3 Glacier storage classes
Every organization around the world has archival data. There is a data archiving need not only for companies that have been around for a while, but also for digital native businesses. Workloads such as medical records, news media content, and manufacturing datasets, often store petabytes – or billions of objects indefinitely. The vast majority of […]
File storage access patterns insights using Amazon FSx for Windows File Server
When using Windows file services, enterprise customers often have the requirement to identify potential security breaches and unauthorized access attempts. Traditionally, customers rely on Windows file server file access auditing to capture and analyze audit records to gain these insights and strengthen their security posture. Amazon FSx for Windows File Server provides fully managed shared […]
Using available Amazon EFS security features while migrating files with AWS DataSync
When performing an online data migration, an important requirement is often security in transit. When evaluating migration options, you should consider if the tools available can provide encryption of data in flight, to help prevent unauthorized users from reading your data. Amazon Elastic File System (EFS) provides the ability to encrypt data in transit by […]
Two AWS Solutions Architects discuss advancing their cloud skill set with digital badges
If you are new to AWS and trying to figure out which Storage services to use, how to use them, and best practices, it can be challenging to find guidance that you can trust. The depth of features and options means that even experienced users can identify better solutions with continuous learning. But how do […]
How TMAP Mobility transferred 2.4 PB of Hadoop data using AWS DataSync
Launched in 2002, TMAP Mobility is Korea’s leading mobility platform, with 20 million registered users and 14 million monthly active users. TMAP provides navigation services based on a wide range of real-time traffic information and data. Previously, the Data Intelligence group at TMAP Mobility operated a mobility-data platform based on a Hadoop Distributed File System […]
Using presigned URLs to identify per-requester usage of Amazon S3
Many software-as-a-service (SaaS) product offerings have a pay-as-you-go pricing model, charging customers only for the resources consumed. However, a pay-as-you-go pricing is only viable when you can accurately track each customer’s use of resources, such as compute capacity, storage, and networking bandwidth. Without this data, SaaS providers do not have visibility into resource consumption of […]
Enable session limits for AWS Transfer Family
Enterprises and organizations proficient in file transfers exchange a wide variety of files, such as digital media contents, images, or large data sets with their business partners or public users. When a large shared dataset is being downloaded by end users, a file transfer server often has a limit on concurrent connection per user to […]





