Amazon FSx for Lustre Features

Overview

Amazon FSx for Lustre is a fully managed service that provides cost-effective, high-performance, scalable storage for compute workloads. Powered by Lustre, the world’s most popular high-performance file system, FSx for Lustre offers shared storage with sub-ms latencies, up to terabytes per second of throughput, and millions of IOPS. FSx for Lustre file systems can also be linked to Amazon Simple Storage Service (S3) buckets, allowing you to access and process data concurrently from both a high-performance file system and from the S3 API.

Improve workload performance

Amazon FSx for Lustre file systems scale to terabytes per second of throughput and millions of IOPS. FSx for Lustre also supports concurrent access to the same file or directory from thousands of compute instances. FSx for Lustre provides consistent low latencies for file operations.

The Lustre open source file system was built to solve the problem of quickly and cheaply processing the world’s ever-growing data sets, and it’s the most widely used file system for the 500 fastest computers in the world. It is battle-tested across a broad set of industries, from energy to life sciences, to media production to financial services, for workloads ranging from genome sequencing to video transcoding to machine learning to fraud detection.

The average first-byte latency when you access file data is sub-millisecond on SSD-based file systems and single-digit millisecond on HDD-based file systems.

Every Amazon FSx for Lustre file system, regardless of the deployment type, storage type, or throughput performance level, is supported by a metadata server backed by low-latency SSD storage. The SSD-based metadata server ensures that all metadata operations, which represent the majority of file system operations, are delivered with sub-millisecond latencies.

Use for any compute workload

FSx for Lustre is compatible with the most popular Linux-based AMIs, including Amazon Linux, Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu, and SUSE Linux.

Amazon FSx for Lustre integrates natively with Amazon S3, making it easy to access your S3 data to run data-processing workloads.

With a few clicks in the Amazon FSx console, you can create a file system that’s linked to one or more S3 buckets. After you link your S3 bucket to your file system, FSx for Lustre transparently presents S3 objects as files and allows you to write results back to S3. Your linked file system is automatically updated as objects are added to, changed in, or deleted from your S3 bucket. FSx for Lustre also automatically tracks file system changes and keeps your S3 bucket updated as files are added, modified, or deleted. FSx for Lustre uses parallel data-transfer techniques to export data back to S3, enabling fast data transfer.

Amazon FSx for Lustre is accessible from workloads running on Amazon EC2 instances or on on-premises computers/servers. Once mounted, you can work with the files and directories in your file system just like you would with a local file system. FSx for Lustre file systems also are accessible from containers running on Amazon Elastic Kubernetes Service (EKS).

Amazon FSx for Lustre integrates with Amazon Sagemaker as an input data source. When using Amazon SageMaker with Amazon FSx for Lustre, your machine learning training jobs are accelerated by eliminating the initial download step from S3, and your TCO is reduced by avoiding the repeated download of common objects (saving S3 request costs) for iterative jobs on the same data set.

Amazon FSx for Lustre integrates with AWS Batch though EC2 Launch Templates. AWS Batch is a cloud-native batch scheduler for HPC, ML, and other asynchronous workloads. AWS Batch will automatically and dynamically size instances to job resource requirements, and use existing FSx for Lustre file systems when launching instances and running jobs.

FSx for Lustre also integrates with AWS ParallelCluster. AWS ParallelCluster is an AWS-supported open-source cluster management tool used to deploy and manage High Performance Computing (HPC) clusters. It can automatically create FSx for Lustre file systems or use existing file systems during the cluster creation process.

Optimize cost

With a few clicks in the Amazon FSx console, CLI, or API you can create and scale a high-performance Lustre file system. With Amazon FSx file systems, you don't have to worry about managing file servers and storage volumes, updating hardware, configuring software, running out of capacity, or tuning performance -- Amazon FSx automates these time-consuming administration tasks.

FSx for Lustre offers a choice between scratch and persistent file systems for short-term and longer-term data processing. Scratch file systems are ideal for temporary storage and shorter-term processing of data. Data is not replicated and does not persist if a file server fails. Persistent file systems are ideal for longer-term storage and workloads. With persistent file systems, data is replicated, and file servers are replaced if they fail.

For further data protection of persistent file systems and to meet business and regulatory compliance requirements, Amazon FSx can also automatically take incremental backups of your file system. Backups are stored in Amazon S3 with 99.999999999% (11 9's) of durability.

FSx for Lustre offers Solid-State Disk (SSD) and Hard Disk Drive (HDD) storage options to optimize cost and performance for your workload. For low-latency, IOPS-intensive workloads that typically feature small, random file operations, you can choose one of the SSD storage options. For throughput-intensive workloads that typically feature large, sequential file operations, you can choose one of the HDD storage options.

If you are selecting an HDD-based file system, you can choose to provision an SSD cache to provide sub-millisecond latencies and higher IOPS for frequently accessed files.

You can use storage quotas to monitor and control user-and group-level storage consumption on your file systems, and to ensure that no user or group is able to consume excessive amounts of capacity. Storage quotas are intended for storage administrators who manage file systems that serve multiple users, teams or projects.

You can use data compression to reduce storage consumption of both your file system storage and your file system backups. The data compression feature uses the LZ4 compression algorithm, which is optimized to deliver high levels of compression without adversely impacting file system performance. Once data compression is enabled, newly written files are automatically compressed by FSx for Lustre before they are written to disk and automatically uncompressed when they are read.

To optimize your available storage capacity, you can release inactive data from your file system after the files are exported to Amazon S3. When a file is released, the file data is removed from the file system (and retained on S3) and the metadata remains on your file system. If a user or application accesses a released file, the data is automatically and transparently loaded back onto your file system from your S3 bucket.

Meet security and compliance requirements

All Amazon FSx for Lustre file systems are encrypted at-rest, and in-transit encryption is available in select regions.

AWS has the longest-running compliance program in the cloud and is committed to helping customers navigate their requirements. Amazon FSx has been assessed to meet global and industry security standards. It complies with PCI DSS, ISO 9001, 27001, 27017, and 27018), and SOC 1, 2, and 3, in addition to being HIPAA eligible. For more information and resources, visit our compliance pages. You can also go to the Services in Scope by Compliance Program page to see a full list of services and certifications.

You access your Amazon FSx file system from endpoints in your Amazon VPC, which enables you to isolate your file system in your own virtual network. You can configure security group rules and control network access to your Amazon FSx file systems.

Amazon FSx is integrated with AWS Identity and Access Management (IAM). This integration means that you can control the actions your AWS IAM users and groups can take to manage your file systems (such as creating and deleting file systems). You can also tag your Amazon FSx resources and control the actions that your IAM users and groups can take based on those tags.

Amazon FSx is integrated with AWS Backup, enabling fully managed, policy-based backup and restore capabilities for your Amazon FSx file systems. The integration with AWS Backup allows you to protect customer data and ensure compliance across AWS services for business continuity purposes.  

To provide additional layers of data protection and meet business continuity, disaster recovery, and compliance requirements, you can copy your Amazon FSx file system backups across AWS Regions, AWS accounts, or both.