AWS Storage Blog

Add storage to your high-performance file system with a single click

Many organizations have on-premises, high-performance workloads burdened with complex management and scalability challenges. Scaling data-intensive workloads on-premises typically involves purchasing more hardware, which can slow time to production and require high upfront investment. The agility to scale compute and storage resources to meet business needs is one of the reasons why our customers choose to move to the AWS cloud.

In this blog post, I am excited to share that Amazon FSx for Lustre (Amazon FSx) has added support for file system storage scaling. You can now increase the storage capacity of your existing Amazon FSx file systems with the click of a button. This capability enables greater flexibility with growing capacity based on your changing business needs. That means you no longer have to worry about the size of the file system during file system creation, avoiding over allocating cost. You can start with a small file system and grow the capacity when it is needed. With the live storage scaling capability, storage capacity increases in a matter of minutes. Since the throughput of an Amazon FSx file system scales linearly with storage capacity, you also get a comparable increase in throughput as you scale the file system.

So, let’s have a little trip through the new feature. We can initiate the scaling through the Amazon FSx console, or we can use the AWS Software Development Kit (SDK) or AWS Command Line Interface (CLI) tools. I cover the Amazon FSx console first.

Scaling storage using the Amazon FSx console

You can find the new scaling operation in the summary page where it says Update next to Storage capacity, or in the actions menu list of the file system.

You can find the new scale operation in the summary page or in the actions menu list of the file system.

After that, you can define the new desired size based on the Deployment type already selected:

You can define the new desired size based on the Deployment type already selected

In this example, I am increasing the file system from 1.2 TiB to 4.8 TiB, and I can monitor the status of the operation through the Updates tab.

I am increasing the file system from 1.2 TiB to 4.8 TiB, and I can monitor the status of the operation through the Updates tab.

The process to scale the file system consists of two phases. The first one being adding new network file servers and scaling the storage on the metadata server, in this way the capacity is available for use within minutes. The second phase is the storage optimization process that transparently rebalances data across the existing and newly added file servers, to optimize the performance.

The status can assume the following values:

  • Pending– Amazon FSx has received the update request, but has not started processing it.
  • In progress– Amazon FSx is processing the update request.
  • Updated; Optimizing– Amazon FSx has increased the file system’s storage capacity. The storage optimization process is now rebalancing data across the file servers.
  • Completed– The storage capacity increase completed successfully.
  • Failed – The storage capacity increase failed.

You can track the status and optimization progress at any time using the Amazon FSx console, CLI, and API.

You can track the status and optimization progress at any time using the Amazon FSx console, CLI, and API.

With the increased file system size, comes a corresponding increase in throughput capacity, based on the throughput tier of your Amazon FSx file system. In this example, we increase the storage from 1.2 TiB to 4.8 TiB on a persistent SSD file system with 100 MB/s per TiB of storage. This increases the base systems throughput from 120 MB/s to 480 MB/s.

Scaling storage using the AWS CLI

We can execute the same operation via command line using AWS Cloud9, but you can use whatever you want. To proceed with the scaling operation you must know the file system ID that you can get with the following command line:

aws fsx --endpoint-url <endpoint> describe-file-systems

The endpoint differs among AWS Regions, and you can get a full list in the Amazon FSx endpoints and quotas documentation. From the execution of the command, you get a return, which is long with detailed list of attributes. We can identify on the top of the return, the file system ID and the other characteristics like deployment type and the actual size.

{

 "FileSystems": [

        {
            "OwnerId": "xxxxxxxxxx",
            "CreationTime": "2020-10-23T21:20:16.086000+02:00",
                          "FileSystemId": "fs-xxxxxxxxxxxx",
            "FileSystemType": "LUSTRE",
            "Lifecycle": "AVAILABLE",
            "StorageCapacity": 1200,
            "StorageType": "SSD",
            "VpcId": "vpc-797b6911",
            "SubnetIds": [
                "subnet-70201a18"
            ],
            "NetworkInterfaceIds": [
                "eni-0b59bc44ec617de5e",
                "eni-0e949b578af87c2c5"
            ],
           "DNSName": "xxxxxxxxxxxxxxxxxxxx.aws.internal",
           "KmsKeyId": "arn:aws:kms:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
           "ResourceARN": "arn:aws:fsx:xxxxxxxxxxxxxxxxxxxxxx",
            "Tags": [],
            "LustreConfiguration": {
                "WeeklyMaintenanceStartTime": "3:04:30",
                "DeploymentType": "PERSISTENT_1",
                "PerUnitStorageThroughput": 100,
                "MountName": "ylkk3bmv",
                "DailyAutomaticBackupStartTime": "05:00",
                "AutomaticBackupRetentionDays": 7,
                "CopyTagsToBackups": false
            }
        }
    ]
}

For a deeper look at the file system, you can list out network file server nodes referred as Object Storage Targets (OSTs) on the file system (how FSx for Lustre file system works). To retrieve the list we can execute the command on a client with the mounted file system:

lfs osts <mountpoint>
[root@ip-172-31-11-53 ior]# lfs  osts /mnt/fsx 
OBDS: 
0: vqnk5bmv-OST0000_UUID ACTIVE

Here you can see that the file system is composed by a single OST. Now I can proceed with the change of the capacity using:

aws fsx --endpoint-url <endpoint> update-file-system --file-system-id=<FileSystemId> 
--storage-capacity <new capacity>

The <new capacity> should be based on the range specified for your file system type.

"FileSystem": {
        "OwnerId": "xxxxxxxxx",
        "CreationTime": "2020-10-23T21:20:16.086000+02:00",
        "FileSystemId": "fs-xxxxxxxxxxxxxxxxx",
        "FileSystemType": "LUSTRE",
        "Lifecycle": "AVAILABLE",
        "StorageCapacity": 1200,
        "StorageType": "SSD",
        "VpcId": "vpc-797b6911",
        "SubnetIds": [
            "subnet-70201a18"
        ],
        "NetworkInterfaceIds": [
            "eni-0b59bc44ec617de5e",
            "eni-0e949b578af87c2c5"
        ],
        "DNSName": "xxxxxxxxxxxxxxxxxxxx.aws.internal",
        "KmsKeyId": "arn:aws:kms:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
        "ResourceARN": "arn:aws:fsx:xxxxxxxxxxxxxxxxxxxxxx",
        "Tags": [],
        "LustreConfiguration": {
            "WeeklyMaintenanceStartTime": "3:04:30",
            "DeploymentType": "PERSISTENT_1",
            "PerUnitStorageThroughput": 100,
            "MountName": "ylkk3bmv",
            "DailyAutomaticBackupStartTime": "05:00",
            "AutomaticBackupRetentionDays": 7,
            "CopyTagsToBackups": false
        },
        "AdministrativeActions": [
            {
                "AdministrativeActionType": "FILE_SYSTEM_UPDATE",
                "RequestTime": "2020-10-24T00:43:41.194000+02:00",
                "Status": "PENDING",
                "TargetFileSystemValues": {
                    "StorageCapacity": 4800
                }
            },
            {
                "AdministrativeActionType": "STORAGE_OPTIMIZATION",
                "RequestTime": "2020-10-24T00:43:41.194000+02:00",
                "Status": "PENDING"
            }
        ]
    }
}

From the execution of the command, you can observe from the output the triggering of two Administrative actions one for the storage scaling and one for the storage optimization mentioned before.

Now at the end of the scaling operation, you can notice the increased number of the network file server nodes from one to four:

[root@ip-172-31-11-53 ior]# lfs  osts /mnt/fsx
OBDS:
0: vqnk5bmv-OST0000_UUID ACTIVE
1: vqnk5bmv-OST0001_UUID ACTIVE
2: vqnk5bmv-OST0002_UUID ACTIVE
3: vqnk5bmv-OST0003_UUID ACTIVE

With a small performance test, we can see the increase of the throughput between the two different sizes of the file system (before the scaling and after the scaling). I carried out this test with IOR, and with the following command on two clients:

ior --posix.odirect -t 1m -b 1m -s 16384 -g -v -w -i 100 -F -k -D 0 -o /mnt/fsx/ior

The first test executed, with the 1.2-TiB file system, showed a first phase of burst of 525 MiB due to the credits, and a subsequent stabilization to 120 MiB/sec.

The first test executed, with the 1.2-TiB file system, showed a first phase of burst of 525 MiB due to the credits, and a subsequent stabilization to 120 MiBsec.

 1.2-TB performance

After scaling the file system to 4.8 TiB, you can see the same behavior. There is first a burst of 1.15 GiB and a stabilization to 500 MiB/s, but with a sensible increase of throughput, aligned with the value reported in the documentation.

After scaling the file system to 4.8 TiB

Final thoughts

In this blog, I outlined the simple steps involved in the storage scaling of Amazon FSx for Lustre, which enables you to save time removing management and planning burden from an operations perspective.

You can use Amazon FSx for Lustre file systems for any high-performance workload where you need a POSIX-compliant file system. Amazon FSx for Lustre file system can scale to petabytes in size, with millions of IOPS, and hundreds of GB/s of bandwidth. Now these powerful file systems can scale at your pace improving your agility and reducing the infrastructure cost.

To learn more about Amazon FSx for Lustre capacity scaling, visit the Amazon FSx for Lustre site and Managing Storage Capacity & Throughput user guide.

Thanks for reading this blog post on Amazon FSx for Lustre storage capacity scaling. If you have any comments or questions, please share them in the comments section.

Fabrizio Manfredi

Fabrizio Manfredi

Fabrizio is a Principal Solutions Architect for Industrial IoT at Amazon Web Services (AWS). His current area of focus is working with the Digital Production Platform service teams on connected factory and predictive quality solutions.