AWS Storage Blog
Add storage to your high-performance file system with a single click
Many organizations have on-premises, high-performance workloads burdened with complex management and scalability challenges. Scaling data-intensive workloads on-premises typically involves purchasing more hardware, which can slow time to production and require high upfront investment. The agility to scale compute and storage resources to meet business needs is one of the reasons why our customers choose to move to the AWS cloud.
In this blog post, I am excited to share that Amazon FSx for Lustre (Amazon FSx) has added support for file system storage scaling. You can now increase the storage capacity of your existing Amazon FSx file systems with the click of a button. This capability enables greater flexibility with growing capacity based on your changing business needs. That means you no longer have to worry about the size of the file system during file system creation, avoiding over allocating cost. You can start with a small file system and grow the capacity when it is needed. With the live storage scaling capability, storage capacity increases in a matter of minutes. Since the throughput of an Amazon FSx file system scales linearly with storage capacity, you also get a comparable increase in throughput as you scale the file system.
So, let’s have a little trip through the new feature. We can initiate the scaling through the Amazon FSx console, or we can use the AWS Software Development Kit (SDK) or AWS Command Line Interface (CLI) tools. I cover the Amazon FSx console first.
Scaling storage using the Amazon FSx console
You can find the new scaling operation in the summary page where it says Update next to Storage capacity, or in the actions menu list of the file system.
After that, you can define the new desired size based on the Deployment type already selected:
In this example, I am increasing the file system from 1.2 TiB to 4.8 TiB, and I can monitor the status of the operation through the Updates tab.
The process to scale the file system consists of two phases. The first one being adding new network file servers and scaling the storage on the metadata server, in this way the capacity is available for use within minutes. The second phase is the storage optimization process that transparently rebalances data across the existing and newly added file servers, to optimize the performance.
The status can assume the following values:
- Pending– Amazon FSx has received the update request, but has not started processing it.
- In progress– Amazon FSx is processing the update request.
- Updated; Optimizing– Amazon FSx has increased the file system’s storage capacity. The storage optimization process is now rebalancing data across the file servers.
- Completed– The storage capacity increase completed successfully.
- Failed – The storage capacity increase failed.
You can track the status and optimization progress at any time using the Amazon FSx console, CLI, and API.
With the increased file system size, comes a corresponding increase in throughput capacity, based on the throughput tier of your Amazon FSx file system. In this example, we increase the storage from 1.2 TiB to 4.8 TiB on a persistent SSD file system with 100 MB/s per TiB of storage. This increases the base systems throughput from 120 MB/s to 480 MB/s.
Scaling storage using the AWS CLI
We can execute the same operation via command line using AWS Cloud9, but you can use whatever you want. To proceed with the scaling operation you must know the file system ID that you can get with the following command line:
aws fsx --endpoint-url <endpoint> describe-file-systems
The endpoint differs among AWS Regions, and you can get a full list in the Amazon FSx endpoints and quotas documentation. From the execution of the command, you get a return, which is long with detailed list of attributes. We can identify on the top of the return, the file system ID and the other characteristics like deployment type and the actual size.
{ "FileSystems": [ { "OwnerId": "xxxxxxxxxx", "CreationTime": "2020-10-23T21:20:16.086000+02:00", "FileSystemId": "fs-xxxxxxxxxxxx", "FileSystemType": "LUSTRE", "Lifecycle": "AVAILABLE", "StorageCapacity": 1200, "StorageType": "SSD", "VpcId": "vpc-797b6911", "SubnetIds": [ "subnet-70201a18" ], "NetworkInterfaceIds": [ "eni-0b59bc44ec617de5e", "eni-0e949b578af87c2c5" ], "DNSName": "xxxxxxxxxxxxxxxxxxxx.aws.internal", "KmsKeyId": "arn:aws:kms:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "ResourceARN": "arn:aws:fsx:xxxxxxxxxxxxxxxxxxxxxx", "Tags": [], "LustreConfiguration": { "WeeklyMaintenanceStartTime": "3:04:30", "DeploymentType": "PERSISTENT_1", "PerUnitStorageThroughput": 100, "MountName": "ylkk3bmv", "DailyAutomaticBackupStartTime": "05:00", "AutomaticBackupRetentionDays": 7, "CopyTagsToBackups": false } } ] }
For a deeper look at the file system, you can list out network file server nodes referred as Object Storage Targets (OSTs) on the file system (how FSx for Lustre file system works). To retrieve the list we can execute the command on a client with the mounted file system:
lfs osts <mountpoint>
[root@ip-172-31-11-53 ior]# lfs osts /mnt/fsx
OBDS:
0: vqnk5bmv-OST0000_UUID ACTIVE
Here you can see that the file system is composed by a single OST. Now I can proceed with the change of the capacity using:
aws fsx --endpoint-url <endpoint> update-file-system --file-system-id=<FileSystemId> --storage-capacity <new capacity>
The <new capacity>
should be based on the range specified for your file system type.
"FileSystem": { "OwnerId": "xxxxxxxxx", "CreationTime": "2020-10-23T21:20:16.086000+02:00", "FileSystemId": "fs-xxxxxxxxxxxxxxxxx", "FileSystemType": "LUSTRE", "Lifecycle": "AVAILABLE", "StorageCapacity": 1200, "StorageType": "SSD", "VpcId": "vpc-797b6911", "SubnetIds": [ "subnet-70201a18" ], "NetworkInterfaceIds": [ "eni-0b59bc44ec617de5e", "eni-0e949b578af87c2c5" ], "DNSName": "xxxxxxxxxxxxxxxxxxxx.aws.internal", "KmsKeyId": "arn:aws:kms:xxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "ResourceARN": "arn:aws:fsx:xxxxxxxxxxxxxxxxxxxxxx", "Tags": [], "LustreConfiguration": { "WeeklyMaintenanceStartTime": "3:04:30", "DeploymentType": "PERSISTENT_1", "PerUnitStorageThroughput": 100, "MountName": "ylkk3bmv", "DailyAutomaticBackupStartTime": "05:00", "AutomaticBackupRetentionDays": 7, "CopyTagsToBackups": false }, "AdministrativeActions": [ { "AdministrativeActionType": "FILE_SYSTEM_UPDATE", "RequestTime": "2020-10-24T00:43:41.194000+02:00", "Status": "PENDING", "TargetFileSystemValues": { "StorageCapacity": 4800 } }, { "AdministrativeActionType": "STORAGE_OPTIMIZATION", "RequestTime": "2020-10-24T00:43:41.194000+02:00", "Status": "PENDING" } ] } }
From the execution of the command, you can observe from the output the triggering of two Administrative actions one for the storage scaling and one for the storage optimization mentioned before.
Now at the end of the scaling operation, you can notice the increased number of the network file server nodes from one to four:
[root@ip-172-31-11-53 ior]# lfs osts /mnt/fsx
OBDS:
0: vqnk5bmv-OST0000_UUID ACTIVE
1: vqnk5bmv-OST0001_UUID ACTIVE
2: vqnk5bmv-OST0002_UUID ACTIVE
3: vqnk5bmv-OST0003_UUID ACTIVE
With a small performance test, we can see the increase of the throughput between the two different sizes of the file system (before the scaling and after the scaling). I carried out this test with IOR, and with the following command on two clients:
ior --posix.odirect -t 1m -b 1m -s 16384 -g -v -w -i 100 -F -k -D 0 -o /mnt/fsx/ior
The first test executed, with the 1.2-TiB file system, showed a first phase of burst of 525 MiB due to the credits, and a subsequent stabilization to 120 MiB/sec.
1.2-TB performance
After scaling the file system to 4.8 TiB, you can see the same behavior. There is first a burst of 1.15 GiB and a stabilization to 500 MiB/s, but with a sensible increase of throughput, aligned with the value reported in the documentation.
Final thoughts
In this blog, I outlined the simple steps involved in the storage scaling of Amazon FSx for Lustre, which enables you to save time removing management and planning burden from an operations perspective.
You can use Amazon FSx for Lustre file systems for any high-performance workload where you need a POSIX-compliant file system. Amazon FSx for Lustre file system can scale to petabytes in size, with millions of IOPS, and hundreds of GB/s of bandwidth. Now these powerful file systems can scale at your pace improving your agility and reducing the infrastructure cost.
To learn more about Amazon FSx for Lustre capacity scaling, visit the Amazon FSx for Lustre site and Managing Storage Capacity & Throughput user guide.
Thanks for reading this blog post on Amazon FSx for Lustre storage capacity scaling. If you have any comments or questions, please share them in the comments section.