How do I troubleshoot network performance issues between Amazon EC2 Linux instances in a VPC and an on-premises host over the internet gateway?
Last updated: 2021-01-21
Packet loss or latency issues exist between my Amazon Elastic Compute Cloud (Amazon EC2) Linux instances and on-premises host over the internet gateway. How can I troubleshoot these issues with network performance?
To diagnose network issues such as packet loss or latency, first test the network to isolate the source of the issue. The following steps can help determine if the source of the issue is a network or an application.
Before you begin troubleshooting, check the following:
- Be sure that the network utilities are installed on both endpoints (the EC2 instance and the on-premises host).
- Use an EC2 instance that supports enhanced networking, and be sure that the drivers are up to date. Enhanced networking provides higher I/O with low CPU utilization, which helps avoid instance-level issues when running performance tests. If enhanced networking isn't enabled, see Enabling enhanced networking on your instance.
- Connect to your Linux instance to access the instances and be sure that there is end-to-end connectivity between your EC2 instance and your on-premises host.
Install the following tools to help troubleshoot and test your network:
- AWSSupport-SetupIPMonitoringFromVPC to collect network metrics such as packet loss, latency, MTR, tcptraceroute, and tracepath.
- MTR to check for ICMP or TCP packet loss and latency problems.
- Traceroute to determine latency or routing problems.
- Hping3 to determine end-to-end TCP packet loss and latency problems.
- Tcpdump to analyze packet capture samples.
Review hops on traceroute or MTR reports using a bottom-up approach. For example, check for loss on the last hop/destination, and then review the following hops. If the packet loss or latency issues continue through the last hop, there might be a network or routing issue. Packet loss/latency on only one hop in the path might occur if there's an issue with the control plane rate limiting on that node. Check if the last hop reported is the destination noted in the command. If it isn't, there might be an issue caused by a restrictive security group.
Test performance using AWSSupport-SetupIPMonitoringFromVPC
This built-in tool collects many of the metrics that you need to troubleshoot your network. For more information, see Debugging tool for network connectivity from Amazon VPC.
Test performance using MTR
The Linux MTR command provides continual, updated output, which enables you to analyze network performance. This diagnostic tool combines the functionality of traceroute and ping utilities. Most Linux distributions come with traceroute and MTR already installed. However, you can also download it from your distribution’s software package manager.
To install MTR, run the following commands:
sudo yum install mtr
sudo apt-get install mtr
To test your network's performance using MTR, run this test bidirectionally between the public IP address of your EC2 instances and your on-premises host. The path between nodes on a TCP/IP network can change if the direction is reversed. Therefore, it's important to obtain MTR results for both directions. You can use a TCP-based trace instead of ICMP, because most internet devices deprioritize ICMP-based trace requests.
Review your packet loss. Packet loss on a single hop usually doesn't indicate an issue. The loss can be the result of a control plane policy that causes the "ICMP time exceeded" messages to be dropped. If you notice sustained packet loss until the destination hop, or packet loss over several hops, this loss might indicate a problem.
Note: It's common to see a few requests time out.
mtr -n -c 200 <Public IP EC2 instance/on-premises host> --report
mtr -n -T -c 200 <Public IP EC2 instance/on-premises host> --report
The argument -T performs a TCP-based MTR, and the --report option puts MTR into report mode. MTR runs for the number of cycles specified by the -c option. Print the statistics, and then exit.
Note: There is a known issue with some versions of MTR where the final hop reports an incorrect value if TCP is used.
Test performance using traceroute
The Linux traceroute utility identifies the path taken from a client node to the destination node. The utility records the time in milliseconds for each router to respond to the request. The utility also calculates the amount of time each hop takes before reaching its destination.
To install traceroute, run the following commands:
sudo yum install traceroute
sudo apt-get install traceroute
Note: Traceroute is not necessary if you run a MTR report. MTR provides latency and packet loss statistics to a destination.
Be sure that port 22 or the port that you're testing is open in both directions. To troubleshoot network connectivity using traceroute, run the command from the client to the server, and from the server back to the client. The path between nodes on a TCP/IP network can change if the direction is reversed. Use a TCP-based trace instead of ICMP, because most internet devices deprioritize ICMP-based trace requests.
sudo traceroute <Public IP of EC2 instance/on-premises host>
sudo traceroute -n -T -p 22 <Public IP of EC2 instance/on-premises host>
The argument -T -p 22 -n performs a TCP-based trace on port 22.
Test performance using hping3
Hping3 is a command-line oriented TCP/IP packet assembler and analyzer that measures end-to-end packet loss and latency over a TCP connection. In addition to ICMP echo requests, hping3 supports TCP, UDP, and RAW-IP protocols. Hping3 also includes a traceroute mode that can send files between a covered channel. Hping3 is designed to scan hosts, assist with penetration testing, test intrusion detection systems, and send files between hosts.
MTRs and traceroute capture per-hop latency. However, hping3 yields results that show end-to-end min/avg/max latency over TCP in addition to packet loss. To install hping3, run the following commands:
sudo yum --enablerepo=epel install hping3
sudo apt-get install hping3
The following command sends 50 TCP SYN packets over port 0. By default, hping3 sends TCP headers to the target host's port 0, with a window size of 64 and without a TCP flag:
hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host>
The following command sends 50 TCP SYN packets over port 22:
hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host> -p 22
Note: Be sure that port 22 or the port that you're testing is open.
Test packet capture samples using tcpdump
It's a best practice to perform simultaneous packet captures on your EC2 instance and on-premises host when diagnosing packet loss/latency issues. Doing so can help isolate the issue at the networking and application layers. To install tcpdump, run the following commands:
sudo yum install tcpdump
sudo apt-get install tcpdump
If you find packet loss in your network, refer to your vendor documentation for instructions on how to check network devices for analysis and troubleshooting. If you are multi-homed, perform these tests using a different Internet Service Provider (ISP), and then compare the results.