How can I troubleshoot slow performance when my Storage Gateway is uploading to AWS?
Last updated: 2022-08-31
I want to troubleshoot slow performance when my gateway on AWS Storage Gateway is uploading to AWS.
Review your internet bandwidth or network throughput to AWS
The internet speed between your gateway and AWS can affect upload performance. To determine the internet bandwidth available to your gateway, run a network test from a virtual machine (VM). Or, use a system that's on the same network as your gateway appliance.
If your gateway connects to AWS through an Amazon Virtual Private Cloud (Amazon VPC) endpoint for Amazon Simple Storage Service (Amazon S3) over an AWS Direct Connect or VPN connection, then run a network throughput test from an on-premises VM to an instance in the VPC.
If your gateway is hosted on-premises and connects to AWS through a VPC endpoint for Storage Gateway over a Direct Connect or VPN connection, then traffic from the gateway to the S3 bucket traverses the public virtual interface or public internet. If the public virtual interface or internet connection is congested, then your gateway's upload performance might be affected. To allow traffic to transverse the private virtual interface, consider setting up your gateway with an Amazon S3 VPC endpoint. When you use this configuration, you must create and configure an Amazon Elastic Compute Cloud (Amazon EC2) proxy on your gateway appliance.
Check the size of the files that are being written to the Storage Gateway appliance
Storage Gateway generally has better upload performance when using larger files than when using smaller files. This is because Storage Gateway breaks up large files into multiple parts, and then uploads the parts in parallel streams to the S3 bucket.
You can benchmark the upload speed from the gateway to AWS by running tests with the file sizes and the number of threads described in Performance guidance for file gateways. Then, review the CloudBytesUploaded metric to determine the upload speed.
Review the gateway's cache storage
If you're using a file gateway, then check your CachePercentDirty metric. Any data written to the gateway that isn't already written to Amazon S3 is considered dirty. A CachePercentDirty metric that's higher than 80% can indicate slow uploads from the gateway to Amazon S3.
If the CachePercentDirty metric is high, then check the CloudBytesUploaded metric to confirm if the upload speed to Amazon S3 is slow. If the upload speed is slow, then consider increasing the internet bandwidth that's available to the gateway.
Additionally, check your gateway's IoWaitPercent metric on Amazon CloudWatch. If you see that your gateway's IoWaitPercent metric is higher than 10% during your testing, then your gateway might have a disk that doesn't have enough I/O to handle the workload. You can review the WriteBytes metric (using the SampleCount statistic) to check your total write I/O to AWS.
If your gateway's cache disk doesn't have enough I/O to handle the workload, then consider changing the cache disk to a faster disk type. For example, consider using an SSD or NVMe-backed SSD disk. Attaching another cache disk to your gateway can help increase the aggregate I/O available to the gateway.
Check the configuration of your gateway's host VM or Amazon EC2 instance
Confirm that the CPU and RAM of your gateway's host VM or EC2 instance can support your gateway's throughput to AWS. For example, every EC2 instance type has a different baseline throughput. If burst throughput is exhausted, then the instance uses its baseline throughput, which can limit the upload throughput to AWS.
If your gateway is hosted on an EC2 instance, check the NetworkOut metric of the instance. If the NetworkOut metric sits at the baseline throughput during your testing, then consider changing the instance to a larger instance type. A larger instance type can achieve more network throughput.
Consider the geographical distance between your gateway and the dataset
It's a best practice to deploy your gateway in the same network as your dataset, or in a network that's geographically close to your dataset. Avoid setting up connections over a Wide Area Network (WAN). One example of a WAN connection is a gateway deployed on an EC2 instance with the file share mounted over Direct Connect or VPN. The latency from on-premises traffic to AWS over the WAN connection affects how fast the data gets to the gateway. This latency eventually affects the upload speed to the S3 bucket. To help reduce upload latency, deploy your gateway in the same AWS Region as the S3 bucket that you're using as the file share.