How can I troubleshoot slow performance when my Storage Gateway is uploading to AWS?
Last updated: 2020-10-12
How can I troubleshoot slow performance when my gateway on AWS Storage Gateway is uploading to AWS?
Review your internet bandwidth or network throughput to AWS
The internet speed between your gateway and AWS can impact upload performance. To determine the internet bandwidth available to your gateway, run a network test from a virtual machine (VM) or system that's on the same network as your gateway appliance.
If your gateway connects to AWS through an Amazon Virtual Private Cloud (Amazon VPC) endpoint for Amazon Simple Storage Service (Amazon S3) over an AWS Direct Connect or VPN connection, then run a network throughput test from an on-premises VM to an instance in the VPC.
If your gateway is hosted on-premises and connects to AWS through a VPC endpoint for Storage Gateway over a Direct Connect or VPN connection, then traffic from the gateway to the S3 bucket traverses the public virtual interface or public internet. If the public virtual interface or internet connection is congested, then your gateway's upload performance can be impacted. To allow traffic to transverse the private virtual interface, then consider setting up your gateway with an Amazon S3 VPC endpoint. With this configuration, you must create and configure an Amazon Elastic Compute Cloud (Amazon EC2) proxy on your gateway appliance.
Check the size of the files that are being written to the Storage Gateway appliance
Storage Gateway generally has better upload performance with larger files when compared to smaller files. This is because Storage Gateway breaks large files up into multiple parts, and then uploads the parts in parallel to the S3 bucket.
You can benchmark the upload speed from the gateway to AWS by running tests with the file sizes and the number of threads described in Performance guidance for file gateways. Then, review the CloudBytesUploaded metric to determine the upload speed.
Review the gateway's cache storage
If you're using a file gateway, then check your CachePercentDirty metric. Any data written to the gateway that hasn't been written to Amazon S3 is considered dirty. A CachePercentDirty metric that's higher than 80% can be an indication of slow uploads from the gateway to Amazon S3.
If the CachePercentDirty metric is high, then check the CloudBytesUploaded metric to confirm if the upload speed to Amazon S3 is slow. If the upload speed is slow, then consider increasing the internet bandwidth that's available to the gateway.
Additionally, check your gateway's IoWaitPercent metric on Amazon CloudWatch. If you see that your gateway's IoWaitPercent metric is higher than 10% during your testing, then your gateway might have a disk that doesn't have enough I/O to handle the workload. You can also review the WriteBytes metric (using the SampleCount statistic) to check your total write I/O to AWS.
If your gateway's cache disk doesn't have enough I/O to handle the workload, then consider changing the cache disk to a faster disk type. For example, consider using an SSD or NVMe-backed SSD disk. Attaching another cache disk to your gateway can also help increase the aggregate I/O available to the gateway.
Check the configuration of your gateway's host VM or Amazon EC2 instance
Confirm that the CPU and RAM of your gateway's host VM or EC2 instance can support your gateway's throughput to AWS. For example, every EC2 instance type has a different baseline throughput. If burst throughput has been exhausted, the instance uses its baseline throughput, which can limit the upload throughput to AWS.
If your gateway is hosted on an EC2 instance, check the NetworkOut metric of the instance. If the NetworkOut metric sits at the baseline throughput during your testing, then consider changing the instance to a larger instance type. A larger instance type can achieve more network throughput.
Consider the geographical distance between your gateway and the dataset
It's a best practice to deploy your gateway in the same network as your dataset, or geographically close to your dataset. Avoid setting up connections over a Wide Area Network (WAN). One example of a WAN connection is a gateway deployed on an EC2 instance with the file share mounted over Direct Connect or VPN. The latency from on-premises traffic to AWS over the WAN connection impacts how fast the data gets to the gateway. This latency eventually affects the upload speed to the S3 bucket. To help reduce upload latency, deploy your gateway in the same AWS Region as the S3 bucket that you're using as the file share.