Optimize price and performance with Amazon FSx for Lustre

Earlier this year, I blogged about the performance characteristics of Amazon FSx for Lustre persistent file systems. As with everything AWS, we never rest on our laurels and we continue to push the envelope and drive innovation. Recently, we launched a new lower-cost Hard Disk Drive (HDD) storage type for Amazon FSx for Lustre Persistent file systems. Don’t blink – cause if you do, you’ll miss the launch of one of our amazing features. Some of you may be asking yourself – How do the new HDD storage type file systems perform? Today, you’ll find out that if you don’t need the low latencies and high IOPS of SSD-based file systems, and your application thrives on high throughput, our Amazon FSx for Lustre HDD-based file systems may be just what you need. In this blog, I share how HDD-based file systems perform, and how you can save up to 80% with HDD-based file systems for heavy throughput-bound workloads.

Amazon FSx for Lustre file systems

Amazon FSx for Lustre now offers HDD-based file systems with two throughput levels – 12 MB/s per TiB of storage capacity or 40 MB/s per TiB of storage capacity. These file systems also come with an optional SSD-based read-only cache that lowers read latencies and increase IOPS and throughput for frequently read files. If you opt in for the read-only cache, your file system automatically has a high performing SSD-based drive sized at 20% of your file system’s storage capacity. This SSD-based drive delivers low latency, high IOPS, and high throughput at 200 MB/s per TiB of the cache storage capacity.

All Amazon FSx for Lustre file systems, both SSD- and HDD-based storage types, are comprised of a single SSD-based metadata server and one or more storage servers. Based on your performance needs, you select the per unit of throughput and storage type of these storage servers. These storage servers can be one of two storage types, either SSD-based or HDD-based. Each storage type offers different per unit of storage throughput levels and performance characteristics. The following table (Table 1) is a breakdown on the different performance characteristics of HDD- and SSD-based file systems.

Table 1 - FSx for Lustre file system performance

Table 1: FSx for Lustre file system performance

How high performance is achieved

When creating a persistent FSx for Lustre file system – HDD- or SSD-based – you choose the total storage capacity of the file system. The throughput a file system is able to support is proportional to its storage capacity. For HDD-based deployment type file systems, you have a choice of either 12 MB/s (persistent-12), 12 MB/s (persistent-12 w/SSD cache), 40 MB/s (persistent-40), or 40 MB/s (persistent-40 w/SSD cache) per TiB of storage capacity. Each of these per-unit throughput options has a different price point but it gives you flexibility to select a throughput level that best aligns with your budget and performance needs. Again, because this post focuses on the new HDD-based FSx for Lustre file systems, I’m only going to dive deep and test the performance characteristics of these HDD-based file systems.

There are three components, plus one optional component, that contribute to an HDD-based file system’s overall throughput: network throughput, in-memory cache, disk throughput, and read-only SSD cache (optional). The following table (Table 2) shows throughput and caching information for HDD-based persistent file systems.

Table 2 - FSx for Lustre HDD-based file system performance

Table 2: FSx for Lustre HDD-based file system performance

When you select the per unit throughput of a file system, you’re really selecting the baseline disk throughput available to that file system. Disk burst throughput, in-memory cache, variable network throughput, and the optional SSD drive cache allow file systems to operate at substantially higher throughput levels than the baseline disk throughput.

File-based workloads are typically spiky, driving high levels of throughput for short periods, but driving lower levels of throughput for longer periods. These types of workloads fit great within the burst model of FSx for Lustre. If your workload is more consistent, select a persistent per unit throughput that aligns with your needs, but remember you still have burst throughput available if you need it. These days, you never know what can happen that can drive throughput levels above the norm.

How I tested

Let me show you how a Persistent-12 HDD-based file system with SSD drive cache will burst above its baseline performance.

First, I create a 6.0-TiB persistent HDD-based file system with a 12 MB/s per TiB throughput and the optional SSD read cache. Based on the performance values in the previous table (Table 2), this 6.0-TiB file system should achieve the performance values for network throughput, cache, and storage throughput outlined in the following table (Table 3).

Performance values for network throughput, cache, and storage throughput for the 6.0-TiB file system

Table 3: FSx for Lustre persistent HDD-based file system performance of a 6.0-TiB file system

Second, I launch five m5n.8xlarge instances using the latest Amazon Linux 2 AMI. I purposely select this instance type because of its non-variable network performance. I don’t want the variable network performance of smaller Amazon EC2 instances to affect my three long running tests. I need consistent network performance from my EC2 instances to the file system.

The following is an example of my user data script. It installs the latest AWS CLI, the Lustre client, IOR, a few packages, and mounts the first HDD-based file system as /fsx. This script does not change the stripe count or size of the file system. In this case, all my tests use the default Lustre stripe configuration – a stripe count of 1 and a stripe size of 1,048,576 bytes.

#cloud-config
repo_update: true
repo_upgrade: all

runcmd:
- curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
- unzip awscli-bundle.zip
- ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
- sudo export PATH=/usr/local/bin:$PATH

- amazon-linux-extras install -y epel lustre2.10
- yum groupinstall -y "Development Tools"
- yum install -y fpart parallel tree nload git libaio-devel openmpi openmpi-devel

- cd /home/ec2-user
- git clone https://github.com/hpc/ior.git
- export PATH=$PATH:/usr/lib64/openmpi/bin
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib/
- cd ior
- ./bootstrap
- ./configure
- make
- sudo cp src/ior /usr/local/bin
- cd /home/ec2-user

- filesystem_id=
- mount_point=/fsx
- availability_zone=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
- region=${availability_zone:0:-1}
- mount_name=$(aws fsx describe-file-systems --file-system-ids ${filesystem_id} --query 'FileSystems[*].LustreConfiguration.MountName' --output text --region ${region})
- mkdir -p ${mount_point}
- echo "${filesystem_id}.fsx.${region}.amazonaws.com:/${mount_name} ${mount_point} lustre defaults,noatime,flock,_netdev 0 0" >> /etc/fstab
- mount -a -t lustre
- chown ec2-user:ec2-user ${mount_point}

Third, I want to test both read and write performance so I use IOR, a commonly used file system benchmarking application. IOR is typically used to evaluate the performance of parallel file systems, to test different file system components using read and write operations. I come up with two IOR scripts that perform parallel read or write operations continuously against the file system. The following are examples of the IOR commands for each type of operation. These operations highlight the performance characteristics of the four components I mentioned earlier – network performance, in-memory cache, disk throughput, and the read-only SSD drive cache (optional).

# Write test
# IOR – writes from all instances to test network/storage throughput
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
mpirun --npernode 4 --oversubscribe ior --posix.odirect -t 1m -b 1m -s 16384 -g -v -w -i 100 -F -k -D 0 -o /fsx/ior-${instance_id}.bin

# Read test
# IOR – writes from 1 instance to generate data set
mpirun --npernode 4 --oversubscribe ior --posix.odirect -t 1m -b 1m -s 1900 -g -v -w -i 1 -F -k -D 0 -o /fsx/ior.bin
# IOR - reads from all instances to test network/storage throughput
mpirun --npernode 4 --oversubscribe ior --posix.odirect -t 1m -b 1m -g -r -i 2000000 -F -k -D 0 -o /fsx/ior.bin

What are the results

The write test uses IOR to generate 320 GiB of data using four processes per instance and continuously writing from all five instances. I bypass the cache on the client EC2 instances to specifically test the performance of the FSx for Lustre file system. Based on the projected performance calculations from Table 3, I expect to achieve write burst throughput of at least 300 MB/s for a short period of time followed by a continuous baseline throughput of 72 MB/s. The following screenshot, Figure 1, of an Amazon CloudWatch graph shows the total throughput of the file system. My file system achieves a peak write burst throughput of 460 MB/s followed by a consistent write baseline throughput of 75 MB/s – both above my projected performance numbers.

Figure 1–12 MBs per TiB with SSD drive cache write test

Figure 1: 12 MB/s per TiB with SSD drive cache write test

The read test also uses IOR. The scripts generate the dataset of 7.6 GB from one instance followed by all instances continuously reading this data. This dataset is larger than the in-memory cache (2.4 GiB), so the read operations are delivered from both the in-memory cache and the SSD drive cache. I bypass the cache on the client EC2 instances to specifically test the performance of the FSx for Lustre file system. Based on the projected performance calculations from Table 3, I expect to achieve read burst throughput of 2280 MB/s for a short period of time followed by a continuous baseline throughput of 240 MB/s. Once again FSx for Lustre exceeds my expectations. My file system achieves a peak read burst throughput of 2475 MB/s followed by a consistent read baseline throughput of 260 MB/s.

Figure 2 - 12 MBs per TiB with SSD drive cache read test

Figure 2: 12 MB/s per TiB with SSD drive cache read test

Final thoughts

As a Solutions Architect, I have the privilege to work with many customers and help them best use the amazing storage services at AWS. Recently I’ve been working with a customer that migrated 2 PB of on-premises SSD-based storage over to Amazon FSx for Lustre HDD-based persistent-12 file systems. If your workload is throughput bound and you don’t need the low latencies and high IOPS of SSD-based storage, then HDD-based persistent file systems can give you the performance you need at a substantially lower cost. Don’t let the small value of 12 MB/s per TiB of storage fool you. These HDD-based file systems perform great at an amazing price. In my tests, the 6.0 TiB Persistent-12 HDD-based file system with the optional SSD drive cache running in the US West (Oregon) Region is priced at $246 per month, which is only 4.1 cents per GiB month. At this price I was able to achieve write performance up to 460 MB/s and read performance up to 2475 MB/s for short periods of time. For continuous access for the remainder of the time, I was able to achieve 75 MB/s write throughput and 260 MB/s read throughput when reading data from the 1.2-TiB SSD drive cache.

Based on the total storage capacity that you need; your file system can achieve high throughput levels. To help show how these throughput levels differ based on different storage capacities, I’ve included performance numbers for 10 TiB and 100-TiB file systems in the following table (Table 4).

Performance numbers for 10 TiB and 100- TiB file systems

Table 4: FSx for Lustre persistent HDD-based file system performance of 10 TiB and 100-TiB file systems

During your decision-making process, review your workload requirements and see what latency, IOPS, and throughput levels your application is driving today. If your application is throughput bound and works great with single-digit millisecond latencies and tens to hundreds of thousands of IOPS, then HDD-based file systems may be a great fit. You can save BIG, while still hitting the performance numbers you need.

To learn more about Amazon FSx for Lustre HDD-based persistent file systems, visit the Amazon FSx for Lustre site and user guide.

Please share your comments in the comments section.