Persistent storage for high-performance workloads using Amazon FSx for Lustre

High-performance file systems are often divided into two types: scratch and persistent. Scratch file systems provide temporary storage with high-performance characteristics such as submillisecond latency, up to hundreds of gigabytes per second of throughput, and millions of IOPS for short-term workloads. By contrast, persistent file systems are designed to combine the performance levels of their scratch counterparts with the durability and availability needed for longer-term data processing. AWS introduced Amazon FSx for Lustre (FSx for Lustre) scratch file systems based on Lustre, the world’s most popular high-performance file system, at re:Invent 2018. With our recently announced FSx for Lustre persistent file system deployment option, you can now deploy a highly available and durable high-performance POSIX-compliant file system that is fully managed on AWS. This provides customers the flexibility to choose between persistent or scratch file systems based on their workload requirements.

In this blog, I walk you through FSx for Lustre persistent file system deployment option, discuss some common use cases, and cover some of the best practices that we recommend. I also cover creating a new persistent FSx for Lustre file system and mounting it.

FSx for Lustre persistent file system

FSx for Lustre persistent file system provides highly available and durable storage for workloads that run for extended periods, or indefinitely. Data on scratch file systems is not replicated and does not persist if a file server used by the file system fails. In contrast, the file servers in persistent file system are highly available and data is automatically replicated within the same Availability Zone.

If a file server becomes unavailable on a persistent file system, it is replaced automatically within minutes of failure. During that time, client requests for data on that server transparently retry and eventually succeed after the file server is replaced. Data on persistent file systems is replicated on disks and any failed disks are automatically and transparently replaced.

FSx for Lustre persistent storage diagram

When to use Amazon FSx for Lustre persistent file system?

Consider using persistent file system for processing-heavy workloads that need durable and highly available storage.

Following are some of the most common use cases for the FSx for Lustre persistent file system:

SAS Grid: We recommend using persistent file systems for all SAS Grid libraries (SASDATA, SASWORK, UTILLOC), as persistent file systems have certain features that align with the characteristics needed for SAS applications. These features include the high availability of these file systems, gigabytes per second throughput, millions of IOPS, submillisecond latencies, and encryption of data at rest and in transit. You can learn more about this use case in our recently published whitepaper.
High Performance Computing (HPC): HPC workloads process, store, and analyze massive amounts of data. These processing-heavy workloads run for longer periods and need highly reliable storage to persist data. These workloads include genomics, machine learning, autonomous vehicles, computational fluid dynamics, seismic processing, research, Electronic Design Automation (EDA), financial modeling, and more.
Persistent storage for Containers (Kubernetes, Amazon EKS): We recommend using the persistent file system deployment option with self-managed Kubernetes or Amazon EKS clusters for containerized machine learning and HPC workloads. Containers are immutable and when a container shuts down data created during its lifetime is lost. Persistent file system is ideal for applications that need the data to persist beyond the lifetime of the container.
Amazon EC2 Spot Statefulness: Using persistent file system for workloads running on EC2 Spot Instances, data can be re-used without worrying about copying data from EC2 instances during spot interruptions.
Data lakes on S3: Customers hosting data lakes in Amazon S3 can quickly spin up an FSx for Lustre file system linked to their S3 bucket or prefix. This makes high-performance storage easily and quickly accessible to compute. FSx for Lustre persistent file system acts as a fast caching layer without rearchitecting your applications, enabling your analytics jobs to run faster while saving on compute costs.

Best practices for persistent file system

Here, I discuss some of the best practices you should follow when using FSx for Lustre persistent file system.

Security: We now provide the ability to encrypt data at rest using customer managed customer master key (CMK) with the persistent file system. If no CMK is specified, the AWS Managed CMK is used.

Encryption in transit is automatically enabled when certain EC2 instance types are used to mount the FSx for Lustre persistent file system. This encryption feature uses the offload capabilities of the underlying hardware, and there is no impact on network performance. For a list of instance types, refer to supported EC2 instances in these Regions.

In addition, customers can use existing features like security groups, network ACLs, Identity and Access Management (IAM) permissions, and Portable Operating System Interface (POSIX) permissions to enforce stronger security.

Integration with durable data repository: We recommend linking your file system to a highly durable data repository like Amazon Simple Storage Service (Amazon S3). This allows you to copy your changes to S3 using the Data Repository Task API.

The Data Repository Task API automatically tracks changes to your file system and provides a simple mechanism to copy your data to S3. These tasks transfer file data, symbolic links (symlinks), and POSIX metadata, including ownership, permissions, and timestamps. When you export a file or directory, your file system exports only data files and metadata that were created or modified since the last export.

Performance: FSx for Lustre persistent file system allows you to choose from three deployment options, PERSISTENT-50, PERSISTENT-100 and PERSISTENT-200. Each of these deployment options comes with 50-MB/s, 100-MB/s, or 200-MB/s baseline disk throughput per TiB of file system storage, respectively. This is shown in the table below:

Deployment type	Network throughput (MB/s per TiB of file system storage provisioned)		Memory for caching (GiB per TiB of file system storage provisioned)	Disk throughput (MB/s per TiB of file system storage provisioned)
	Baseline	Variable		Baseline	Burst
PERSISTENT-50	250	Up to 1,300*	2.2	50	Up to 240
PERSISTENT-100	500	Up to 1,300*	4.4	100	Up to 240
PERSISTENT-200	750	Up to 1,300*	8.8	200	Up to 240

When you select the per unit throughput at the time of file system creation, you are selecting the baseline disk throughput available for that file system. The baseline and variable network throughput, in-memory cache and burst disk throughput of the file system allows it to operate at much higher throughput rates than the baseline disk throughput. When you read data that is stored on the file server’s in-memory cache, file system performance is determined by the network throughput. When you write data to your file system, file system performance is determined by the lower of the network throughput and disk throughput. This is also the case when you read data that is not stored on the in-memory cache. You can learn more about this in our performance section of our documentation.

We recommend testing your workload with PERSISTENT-50 file system deployment option, which provides a baseline disk throughput of 50 MB/s/TiB. By taking advantage of in-memory cache, baseline network throughput of 250 MB/s/TiB, variable network throughput of up to 1300 MB/s/TiB, and burst disk throughput of up to 240 MB/s/TiB, you can achieve much higher throughput rates for your file system. Workloads that burst for shorter periods can take advantage of burst throughput and help you save on throughput costs.

To achieve the highest throughput levels the file system is designed for, we recommend parallelizing your workload. Parallelizing workloads by increasing the number of threads per file system client enables you to drive higher throughput to the file system because FSx for Lustre bundles writes to disks. If clients running the workload are fully utilized, adding additional clients also enables you to drive higher throughput to the file system for the same reason.

EC2 instances use their network interface to access FSx for Lustre so instances with greater network performance should achieve higher file system throughput to FSx for Lustre. Network optimized families like r5n, r5dn, m5n, m5dn, ie3n, c5n, or other network optimized instance families should achieve higher file system throughput to FSx for Lustre versus non-network optimized instances.

In addition, a quick reminder of additional best practices you can use to optimize performance if your workload is not evenly balanced across the disks in your file system:

All file data in Lustre is stored on disks called object storage targets (OSTs). For files imported from Amazon S3, the ImportedFileChunkSize parameter determines how many OSTs imported files are striped across. You can modify this parameter when you create your file system.
For optimal performance, stripe large files with high throughput requirements across all the OSTs comprising your file system. You can define a Progressive File Layout (PFL), which avoids much of the need to explicitly specify layouts for files of different sizes.

Data migration: You can use AWS DataSync to copy data from your on-premises or in-cloud self-managed NFS into an S3 bucket. You can then quickly spin up your FSx for Lustre file system linked to the same S3 bucket or prefix. FSx for Lustre imports the objects in your S3 bucket as files, and also enables a lazy-load of the file contents from S3 when you first access a file.

How to create a new persistent FSx for Lustre file system:

Supported storage capacities:

Persistent file systems can be created in sizes of 1.2 TiB or in increments of 2.4 TiB.

Available storage throughput options:

As discussed earlier, you can choose between 50-MB/s, 100-MB/s, and 200-MB/s storage throughput per TiB of file system’s total storage capacity. You pay for the amount of throughput you provision.

Example 1: A 2.4-TiB file system configured with 50 MB/s/TiB of throughput per unit of storage. This provides an aggregate baseline disk throughput of 117 MB/s, burst disk throughput up to 563 MB/s, and variable network throughput up to 3047 MB/s.
Example 2: A 2.4-TiB file system configured with 100 MB/s/TiB of throughput per unit of storage. This provides an aggregate baseline disk throughput of 234 MB/s, burst disk throughput up to 563 MB/s, and variable network throughput up to 3047 MB/s.
Example 3: A 2.4-TiB file system configured with the default 200 MB/s/TiB of throughput per unit of storage. This provides an aggregate baseline disk throughput of 469 MB/s, burst disk throughput up to 563 MB/s, and variable network throughput up to 3047 MB/s.

Creating a new file system:

You can use the AWS Management Console, AWS Command Line Interface (AWS CLI), an AWS CloudFormation template, or FSx for Lustre APIs to create the persistent file system. The example below using AWS CLI creates a new 1.2-TiB persistent FSx for Lustre file system with 50-MB/s/TiB throughput. In this example, the file system is linked to a durable data repository on Amazon Simple Storage Service (Amazon S3).

$ aws fsx \
 create-file-system \
 --file-system-type LUSTRE \
 --storage-capacity 1200 \
 --kms-key-id 1234abcd-12ab-34cd-56ef-1234567890ab \
 --subnet-ids subnet-012345abcdef \
 --lustre-configuration \
    ImportPath=s3://your-S3-bucket, \
    ExportPath=s3://your-S3-bucket, \
    DeploymentType=PERSISTENT_1, \
    PerUnitStorageThroughput=50

Output of CreateFileSystem API shows the file system id, DNS name, and mountname for the new file system:

{
    "FileSystem": {
        "OwnerId": "0123456789",
        "CreationTime": 1579795074.158,
        "FileSystemId": "fs-0123456abcdefg",
        "FileSystemType": "LUSTRE",
        "Lifecycle": "CREATING",
        "StorageCapacity": 1200,
        "VpcId": "vpc-abcdef",
        "SubnetIds": [
            "subnet-abc123"
        ],
        "DNSName": "fs-0123456abcdefg.fsx.us-east-2.amazonaws.com",
        "KmsKeyId": "arn:aws:kms:us-east-2:0123456789:key/abcde-34cb-4f2e-a6b1-f1ebec93fa99",
        "ResourceARN": "arn:aws:fsx:us-east-2:0123456789:file-system/fs-0123456abcdefg ",
        "Tags": [],
        "LustreConfiguration": {
            "WeeklyMaintenanceStartTime": "2:07:30",
            "DataRepositoryConfiguration": {
                "ImportPath": "s3://your-S3-bucket",
                "ExportPath": "s3://your-S3-bucket/FSxLustre20200123T155754Z",
                "ImportedFileChunkSize": 1024
            },
            "DeploymentType": "PERSISTENT_1",
            "PerUnitStorageThroughput": 50,
            "MountName": "tmmqzbmv"
        }
    }
}

You can query the status of file system creation using DescribeFileSystems API as shown here:

$ aws fsx \
  describe-file-systems \
  --file-system-ids fs-0123456abcdefg

Output from DescribeFileSystems API shows Lifecycle state as AVAILABLE once the file system is created.

{
    "FileSystems": [
        {
            "OwnerId": "0123456789",
            "CreationTime": 1579795074.158,
            "FileSystemId": "fs-0123456abcdefg",
            "FileSystemType": "LUSTRE",
            "Lifecycle": "AVAILABLE",
            "StorageCapacity": 1200,
            "VpcId": "vpc-abcdef",
            "SubnetIds": [
                "subnet-abc123"
            ],
            "NetworkInterfaceIds": [
                "eni-0abcdefg12345678h",
                "eni-0hijklmn12345678o"
            ],
            "DNSName": "fs-0123456abcdefg.fsx.us-east-2.amazonaws.com",
            "KmsKeyId": "arn:aws:kms:us-east-2:0123456789:key/abcde-34cb-4f2e-a6b1-f1ebec93fa99",
            "ResourceARN": "arn:aws:fsx:us-east-2:0123456789:file-system/fs-0123456abcdefg",
            "Tags": [],
            "LustreConfiguration": {
                "WeeklyMaintenanceStartTime": "2:07:30",
                "DataRepositoryConfiguration": {
                    "ImportPath": "s3://your-S3-bucket",
                    "ExportPath": "s3://your-S3-bucket/FSxLustre20200123T155754Z",
                    "ImportedFileChunkSize": 1024
                },
                "DeploymentType": "PERSISTENT_1",
                "PerUnitStorageThroughput": 50,
                "MountName": "tmmqzbmv"
            }
        }
    ]
}

Mounting the file system

Connect to your instance and install the Lustre client. Refer to the installation instructions page for different OS types.

$ sudo amazon-linux-extras install -y lustre2.10

Create a new directory on your EC2 instance, for example /fsx.

$ sudo mkdir /fsx

Mount the file system you created as shown below:

$ sudo mount -t lustre -o noatime,flock file_system_dns_name@tcp:/mountname /fsx

Replace the file_system_dns_name and mountname with actual values as seen in the output of DescribeFileSystems API command. The persistent file system requires the mountname to be specified.

Summary

In this blog post, I introduced you to FSx for Lustre’s persistent deployment option. This deployment option provides highly available and durable storage, gigabytes per second throughput, millions of IOPS, submillisecond latencies, and encryption of data at rest and in transit. The file servers are high available and data is replicated within the same Availability Zone, allowing workloads to persist data in the event of file server failure.

I discussed some common use cases (SAS Grid, HPC workloads, self-managed Kubernetes, Amazon EKS, EC2 Spot statefulness, data lakes on S3) for the persistent file system deployment option. In the best practices section, I discussed how you can benefit from in-memory cache on the file servers, baseline and variable network throughput, and burst disk throughput to achieve much higher throughput rates for your file system. To achieve the highest throughput levels the file system is designed for, I recommend parallelizing your workload. Spiky workloads that burst for shorter periods can take advantage of burst throughput and help you save on throughput costs. We also reviewed best practices for security, copying changes using S3 integration, and migration. Lastly, I also showed you how to deploy an FSx for Lustre persistent file system and mount the file system on a client.

To wrap up this blog post, I would like to remind you that FSx for Lustre can be used for any high-performance workload where you need a POSIX-compliant file system on supported Linux clients.

To get started with the new persistent file system, try it out in the Amazon FSx console, use the AWS CLI, use a CloudFormation template, or use the FSx for Lustre APIs. Thank you for reading this blog post. Please leave a comment if you have any questions or feedback.