Achieving high throughput with a low-cost Windows file system

“Wow!” is a common response I hear from customers after testing or migrating their Windows file storage workloads to Amazon FSx for Windows File Server (Amazon FSx). When they say this, they are referring to low-cost Hard Disk Drive (HDD) file systems. Earlier this year we announced the availability of a low-cost storage option – “New – Low-Cost HDD Storage Option for Amazon FSx for Windows File Server” – giving customers the choice to select either HDD or Solid State Drive (SSD) storage when creating file systems on Amazon FSx. The HDD storage option is designed for a broad spectrum of workloads including home directories, departmental shares, and content management systems. In this post, I share how HDD file systems deliver high performance for your file-based applications. I use DiskSpd, a Microsoft storage performance tool commonly used for synthetic (simulated activity) storage testing, to test components of an Amazon FSx for Windows File Server using different read and write operations. I hope to show that you can achieve high throughput from a low-cost Amazon FSx for Windows File Server, and also say “Wow!”

Amazon FSx provides fully managed, highly reliable, scalable file storage that is accessible over the industry-standard Server Message Block (SMB) protocol. It is built on Microsoft Windows Server, delivering a wide range of administrative features such as data deduplication, end-user file restores, and Microsoft Active Directory (AD) integration. It offers single-AZ and multi-AZ deployment options, fully managed backups, and encryption of data at rest and in transit. Amazon FSx file storage is accessible from Windows, Linux, and MacOS compute instances and devices running on AWS or on premises.

This week, we launched support for increasing the storage capacity and changing the throughput capacity of file systems. This enables you to dynamically increase the storage capacity of a file system as your data needs grow. With the new capability to change throughput capacity, you can dynamically adjust throughput capacity for cyclical workloads or for one-time bursts, common for time-sensitive migrations. You can read more about these new features from the AWS News Blog – Amazon FSx for Windows File Server – Storage Size and Throughput Capacity Scaling.

Components of high throughput

There are a few decisions needed when creating a file system with Amazon FSx. You must select the deployment type (Multi-AZ or Single-AZ), storage type (SSD or HDD), storage capacity (32 to 65,536 GiB for SSD; 2,000 to 65,536 GiB for HDD), and throughput capacity (8, 16, 32, 64, 128, 256, 512, 1024, or 2048 MB/s). Throughput capacity is the attribute that contributes most to the overall attainable performance and throughput of a file system. However, there are three resource components that significantly impact a file system’s overall throughput: network throughput, in-memory cache, and disk throughput. These components are illustrated in the following diagram (Figure 1).

Figure 1 – performance components of an Amazon FSx for Windows File Server

Figure 1 – performance components of an Amazon FSx for Windows File Server

The performance section of the Amazon FSx user guide has a comprehensive table that shows the disk, in-memory cache, IOPS, and network throughput for all possible throughput capacities. The following table (Table 1) is a subset of that table, showing throughput and caching information.

File system throughput capacity (MB/s)	Network throughput (MB/s)		Memory for caching (GB)	Disk throughput (MB/s)
File system throughput capacity (MB/s)	Baseline	Variable	Memory for caching (GB)	Baseline	Burst
8	8	Up to 600	0.5	8	Up to 260
16	16	Up to 600	1	16	Up to 260
32	32	Up to 600	2	32	Up to 260
64	64	Up to 600	4	64	Up to 350
128	150	Up to 1250	8	128	Up to 600
256	300	Up to 1250	16	256	Up to 600
512	600	Up to 1250	32	512	–
1024	1500	–	64	1024	–
2048	3125	–	128	2048	–

Table 1 – Amazon FSx for Windows File Server performance

When you select the throughput capacity of a file system, you’re really selecting the baseline disk throughput available to that file system. This is typically the slowest performing component of a file system. Burst-disk throughput, in-memory cache, and the baseline and variable network performance of the file system allows it to operate at substantially higher throughput rates than the baseline disk throughput. This gives you access to much more throughput than the throughput you actually chose when creating the file system.

Let me show you how Amazon FSx file systems provide performance above the baseline throughput levels.

How I tested

First, I create a file system using Amazon FSx with the following attributes:

Multi-AZ deployment type
HDD storage type
5120 GiB storage capacity
32 MB/s throughput capacity

The file system is joined to an existing AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD).

Second, I launch one m5n.8xlarge Amazon EC2 instance using the latest Windows Server 2019 Amazon Machine Image (AMI). I purposely select this instance type because of its non-variable network performance to ensure that EC2 performance isn’t a limiting factor. I need consistent network performance from Amazon EC2 to the file system.

The following script is an example of my user data script which installs the latest version of DiskSpd.

<powershell>

# install DiskSpd (windows)
$path = "C:\Tools\DiskSpd-2.0.21a"
$url = "https://gallery.technet.microsoft.com/DiskSpd-A-Robust-Storage-6ef84e62/file/199535/2/DiskSpd-2.0.21a.zip"
$destination = "C:\Tools\DiskSpd-2.0.21a.zip"
$download = New-Object -Typename System.Net.WebClient
New-Item -Type Directory -Path $path
$download.DownloadFile($url,$destination)
$extract = New-Object -ComObject Shell.Application
$files = $extract.Namespace($destination).Items()
$extract.NameSpace($path).CopyHere($files)

</powershell>

Third, from the instance I create an 8 GiB file and use DiskSpd to continuously write to the file for 15 minutes. The following is an example of the DiskSpd write script. I bypass the local cache on the client using the -Sr option, use only 1 thread, and use a 1 MB block size.

$random = $(Get-Random)
fsutil file createnew \\amznfsxovuaxsaw.example.com\share\${env:computername}-$random.dat 8589934592
C:\Tools\DiskSpd-2.0.21a\amd64\DiskSpd.exe -d900 -s1M -w100 -t1 -o32 -b1M -Sr -L \\amznfsxovuaxsaw.example.com\share\${env:computername}-$random.dat

During the test, I open Task Manager and monitor the outbound network traffic from the EC2 instance. On occasion, the write test achieves peak burst throughput of 400 MB/s multiple times with a burst throughput of 287.5 MB/s between the peaks. This pattern is very consistent during the entire test. See the following Task Manager screenshot (Figure 2).

Figure 2 – EC2 client network throughput showing burst throughput from a write test

Figure 2 – EC2 client network throughput showing burst throughput from a write test

The file system is able to burst much longer than the 15-minute test, so I lengthen the duration of the write test so it consumes all “burstability” or burst capacity of the file system. Once I see the throughput drop, I know the burst capacity is depleted and I terminate the test to prepare for the next test – a baseline throughput write test.

Fourth, now that the burst capacity of the file system has been consumed, I run the same DiskSpd script again to test the baseline write throughput of the file system.

The following screenshot (Figure 3) of Task Manager shows outbound network traffic from the EC2 instance. During the write baseline test, we see the instance achieving 32 MB/s throughput writing to the file system.

Figure 3 – EC2 client network throughput showing baseline throughput from a write test

Figure 3 – EC2 client network throughput showing baseline throughput from a write test

Fifth, I wait some time for the burst credits to be replenished before starting the next test. I want to test just the network performance of the file system so I create a 2 GB file that can fully reside in the in-memory cache. I run a DiskSpd command that reads from the file continuously for 15 minutes. This command, like the other ones, uses the -Sr option to bypass the local cache of the EC2 instance. This means all my IO requests are coming over the network from the in-memory cache of the file system and not from disk. The following is an example of the DiskSpd script.

$random = $(Get-Random)
fsutil file createnew \\amznfsxovuaxsaw.example.com\share\${env:computername}-$random.dat 2000000000
C:\Tools\DiskSpd-2.0.21a\amd64\DiskSpd.exe -d900 -s1M -r100 -t1 -o32 -b1M -Sr -L \\amznfsxovuaxsaw.example.com\share\${env:computername}-$random.dat

Wow!

Once again, I open Task Manager to monitor the outbound network traffic of the EC2 instance as it reads from FSx for Windows File Server. I see a consistent 5.0 Gbps or 625 MB/s receive throughput while it reads from the file system (Figure 4).

Figure 4 – EC2 client network throughput showing in-memory cache read throughput

Figure 4 – EC2 client network throughput showing in-memory cache read throughput

I realize some workloads might be much larger than the memory designated for caching. However, the active working set portion of your workload should be able to take advantage of this in-memory cache and high variable network throughput of these file systems.

Synthetic testing tools like DiskSpd are useful if they’re configured to mimic real-world IO patterns, but nothing is better than running actual real-life workloads against the file system. When talking to customers, I always emphasize the importance of testing their actual workloads against the file system – test, test, test! For my final tests, I download a collection of Earth science datasets maintained by NASA, available as a part of the Registry of Open Data on AWS. Within this dataset are some inverse cartographic transformations of GIMMS AVHRR Global NDVI files. I zip and unzip a subset of this dataset, about 757 .n07-V13g files from the 1980s, 1990s, 2000s, and 2010s totaling 13.1 GB. I use WinRAR and four threads in parallel to compress the dataset down to 2.29 GB within the Amazon FSx file system. This zip took 99 seconds to complete. The Task Manager screenshot (Figure 5) shows the read throughput during compression driving 237.5 MB/s.

Figure 5 – EC2 client network read throughput during zip test

Figure 5 – EC2 client network read throughput during zip test

The final test is to unzip the compressed file to a new folder on the Amazon FSx file system. Figure 6 shows a screenshot of Task Manager during the same unzip operation with write throughput achieving 187.5 MB/s. The unzip completed in 73 seconds.

Figure 6 – EC2 client network write burst throughput during unzip test

Figure 6 – EC2 client network write burst throughput during unzip test

Test results

During these read and write tests the file system achieved and at times exceeded the documented throughput capacities of a 32 MB/s FSx for Windows File Server. The following table summarizes the results of the tests (Table 2).

Test	Throughput
DiskSpd	Baseline write: 32 MB/s (persisted to disk) Peak burst write: 400 MB/s (persisted to disk) Consistent burst write: 287.5 MB/s (persisted to disk) In-memory cache read: 625 MB/s
Zip	Burst read: 237.5 MB/s
Unzip	Burst write: 187.5 MB/s (persisted to disk)

Table 2 – DiskSpd and zip/unzip test results

My file system is in the US West (Oregon) Region and is a multi-AZ HDD 5120 GiB 32 MB/s Amazon FSx file system. It is able to achieve 625 MB/s during the in-memory cache read test, 400 MB/s peak burst during my write test, and a consistent 32 MB/s during my baseline write test. This file server ran for 2 hours and it cost only $0.742, which works out to $270.72 per month (excluding backups). For this price you get:

Performant, fully managed, highly available Windows file storage
Full failover functionality between two Availability Zones
Data replicated within each Availability Zone for high durability
Encryption of data at rest and in transit
Data deduplication, Shadow Copies (snapshots)
Compliance with multiple compliance programs like PCI DSS, ISO (9001, 27001, 27017, and 27018), SOC (1, 2, and 3), and GDPR
A HIPAA-eligible file system

Final thoughts

File-based workloads typically have a small portion of the overall data set that is actively used at any given point in time. These workloads will benefit from the in-memory cache and high variable network throughput of Amazon FSx for Windows File Server, as it reduces the trips back to the underlying storage. Also, these workloads are typically spiky, driving high levels of throughput for short periods, but driving lower levels of throughput for longer periods. These types of workloads fit great within the burst model of Amazon FSx. If your workload is more consistent, select the throughput capacity that aligns with your needs; but remember you still have burst throughput available if you need it. These days, you never know what can happen that can drive throughput levels above the norm.

How much throughput do your Windows applications really need? You’d be surprised that most Windows workloads require little sustained throughput from a shared file system. That’s what makes Amazon FSx such a great fit, because you get the throughput you provision and a lot more when you need it. As the results show, even low-cost HDD file systems are able to achieve high levels of throughput even above the documented baseline throughput (we do document the network levels based on caching, and the burst levels) of these file systems.

To learn more about Amazon FSx for Windows File Server HDD file systems, visit the Amazon FSx for Windows File Server site and user guide.

Thanks for reading this blog post, please leave any questions or comments in the comments section!

AWS Storage Blog

Achieving high throughput with a low-cost Windows file system

Components of high throughput

How I tested

Test results

Final thoughts

Resources

Follow