Mountpoint for Amazon S3 – Generally Available and Ready for Production Workloads
Update (September 2023) – Add information about enabling file deletion.
Mountpoint for Amazon S3 is an open source file client that makes it easy for your file-aware Linux applications to connect directly to Amazon Simple Storage Service (Amazon S3) buckets. Announced earlier this year as an alpha release, it is now generally available and ready for production use on your large-scale read-heavy applications: data lakes, machine learning training, image rendering, autonomous vehicle simulation, ETL, and more. It supports file-based workloads that perform sequential and random reads, sequential (append only) writes, and that don’t need full POSIX semantics.
Many AWS customers use the S3 APIs and the AWS SDKs to build applications that can list, access, and process the contents of an S3 bucket. However, many customers have existing applications, commands, tools, and workflows that know how to access files in UNIX style: reading directories, opening & reading existing files, and creating & writing new ones. These customers have asked us for an official, enterprise-ready client that supports performant access to S3 at scale. After speaking with these customers and asking lots of questions, we learned that performance and stability were their primary concerns, and that POSIX compliance was not a necessity.
When I first wrote about Amazon S3 back in 2006 I was very clear that it was intended to be used as an object store, not as a file system. While you would not want use the Mountpoint / S3 combo to store your Git repositories or the like, using it in conjunction with tools that can read and write files, while taking advantage of S3’s scale and durability, makes sense in many situations.
All About Mountpoint
Mountpoint is conceptually very simple. You create a mount point and mount an Amazon S3 bucket (or a path within a bucket) at the mount point, and then access the bucket using shell commands (
find, and so forth), library functions (
opendir, and so forth) or equivalent commands and functions as supported in the tools and languages that you already use.
Under the covers, the Linux Virtual Filesystem (VFS) translates these operations into calls to Mountpoint, which in turns translates them into calls to S3:
PUT, and so forth. Mountpoint strives to make good use of network bandwidth, increasing throughput and allowing you to reduce your compute costs by getting more work done in less time.
Mountpoint can be used from an Amazon Elastic Compute Cloud (Amazon EC2) instance, or within an Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (EKS) container. It can also be installed on your existing on-premises systems, with access to S3 either directly or over an AWS Direct Connect connection via AWS PrivateLink for Amazon S3.
Installing and Using Mountpoint for Amazon S3
Mountpoint is available in RPM format and can easily be installed on an EC2 instance running Amazon Linux. I simply fetch the RPM and install it using
For the last couple of years I have been regularly fetching images from several of the Washington State Ferry webcams and storing them in my wsdot-ferry bucket:
I collect these images in order to track the comings and goings of the ferries, with a goal of analyzing them at some point to find the best times to ride. My goal today is to create a movie that combines an entire day’s worth of images into a nice time lapse. I start by creating a mount point and mounting the bucket:
I can traverse the mount point and inspect the bucket:
I can create my animation with a single command:
And here’s what I get:
As you can see, I used Mountpoint to access the existing image files and to write the newly created animation back to S3. While this is a fairly simple demo, it does show how you can use your existing tools and skills to process objects in an S3 bucket. Given that I have collected several million images over the years, being able to process them without explicitly syncing them to my local file system is a big win.
Mountpoint for Amazon S3 Facts
Here are a couple of things to keep in mind when using Mountpoint:
Pricing – There are no new charges for the use of Mountpoint; you pay only for the underlying S3 operations. You can also use Mountpoint to access requester-pays buckets.
Performance – Mountpoint is able to take advantage of the elastic throughput offered by S3, including data transfer at up to 100 Gb/second between each EC2 instance and S3.
Credentials – Mountpoint accesses your S3 buckets using the AWS credentials that are in effect when you mount the bucket. See the CONFIGURATION doc for more information on credentials, bucket configuration, use of requester pays, some tips for the use of S3 Object Lambda, and more.
Operations & Semantics – Mountpoint supports basic file operations, and can read files up to 5 TB in size. It can list and read existing files, and it can create new ones. It cannot modify existing files or delete directories, and it does not support symbolic links or file locking (if you need POSIX semantics, take a look at Amazon FSx for Lustre). To enable deletion of existing files, pass the
--allow-delete flag to the
mount-s3 command. For more information about the supported operations and their interpretation, read the SEMANTICS document.
Storage Classes – You can use Mountpoint to access S3 objects in all storage classes except S3 Glacier Flexible Retrieval, S3 Glacier Deep Archive, S3 Intelligent-Tiering Archive Access Tier, and S3 Intelligent-Tiering Deep Archive Access Tier.
As you can see, Mountpoint is really cool and I am guessing that you are going to find some awesome ways to put it to use in your applications. Check it out and let me know what you think!