How can I use Amazon CloudWatch metrics to identify NAT gateway bandwidth issues?
Last updated: 2017-12-21
To identify the source of bandwidth issues with your NAT gateway, follow these steps:
- Benchmark the networking throughput for the traffic flowing through the NAT gateway and bytes per second for your EC2 instances.
- Review the CloudWatch metrics for the NAT gateway that is having issues.
- Check all the instances behind the NAT gateway, and verify their CloudWatch metrics.
- Compare the results between the benchmarking tests and the CloudWatch metrics.
Benchmark the networking throughput
- Set up a test environment to benchmark your network throughput between Amazon EC2 Linux instances in the same VPC.
- Benchmark the traffic (bytes per second) that an instance can handle.
- Repeat these steps for the different instance types that you have running behind the NAT gateway. To identify the instance types, see Check the instances behind the NAT gateway section below.
Review the CloudWatch metrics for NAT gateway bandwidth issues
- Open the Amazon CloudWatch console.
- In the navigation pane, under Metrics, search for the NAT gateway.
- Select the NAT gateway, and then choose the PacketsDropCount metric.
Note: A healthy NAT gateway will always have a value of zero. A non-zero value indicates an on-going transient issue with the NAT gateway. If the value is not zero, refer to the AWS Personal Health Dashboard. If there are no notifications on the AWS Personal Health Dashboard, open a case with AWS Support.
- Select the NAT gateway, and confirm that there is a value of zero for the ErrorPortAllocation metric.
Note: A value greater than zero indicates that too many concurrent connections to the same destination are open through the NAT gateway.
- Select BytesOutToDestination, BytesOutToSource, BytesInFromDestination, and BytesInFromSource.
Note: Bandwidth is calculated as [( BytesOutToDestination + BytesOutToSource + BytesInFromDestination + BytesInFromSource) * 8 / time period in seconds].
If you require more than 45Gbps of bandwidth bursts, you can split the resources between multiple subnets and create multiple NAT gateways. For optimal performance, create your EC2 instances across private subnets that are in the same Availability Zone as your NAT gateway.
Check the instances behind the NAT gateway
- Open the Amazon VPC console.
- In the navigation pane, under Route Tables, select the route tables that have routes pointing to the NAT gateway.
- Select the Subnet Association view, and note all the subnet IDs.
- Open the Amazon EC2 console.
- In the navigation pane, under Instances, choose the settings icon to view the Show/Hide Columns.
- Select Subnet ID and Instance Type.
- Note the IDs of all the instances that are launched in the subnets noted in step 3.
Verify the CloudWatch metrics for all the instances behind the NAT gateway
- Open the Amazon CloudWatch console.
- In the navigation pane, under Metrics, choose EC2.
- Select the IDs of all the instances behind the NAT gateway that were noted previously.
- Under the Metric Name column, select NetworkIn/NetworkOut and CPUUtilization on all the instances during the time that you experienced bandwidth issues.
- Confirm that there are no CPU spikes or abnormal increases in traffic at the same time as the bandwidth issue.
- Enable the flow logs at the subnet level to review the traffic flowing through the NAT gateway. For more information about enabling flow logs, see Logging IP traffic using VPC Flow Logs.
Compare the results
- If the combined sum of networking throughput metrics across all the instances behind the NAT gateway is equal to or more than 45 Gbps bursts, then your bandwidth on the NAT gateway should reflect a value that is greater than 45 Gbps. If your bandwidth on the NAT gateway is greater than 45 Gbps, you might want to split your traffic across multiple NAT gateways.
- If the combined sum of throughput metrics is less than or equal to 45 Gbps bursts, then the bandwidth on the NAT gateway should reflect a value that is less than 45 Gbps. If your bandwidth on the NAT gateway is less than 45 Gbps, the NAT gateway can sufficiently handle the traffic flowing through it.