How do I troubleshoot network performance issues between EC2 Linux or Windows instances in a VPC and an on-premises host over the internet gateway?
Last updated: 2022-03-21
Packet loss or latency issues exist between my Amazon Elastic Compute Cloud (Amazon EC2) instances and on-premises host over the internet gateway. How can I troubleshoot these issues with network performance?
To diagnose network issues such as packet loss or latency, first test the network to isolate the source of the issue. The following resolution can help determine if the source of the issue is a network or an application. It's a best practice to benchmark the performance results so that you can compare the results when you're observing the performance issues.
Before you begin troubleshooting, check the following:
- Be sure that the network utilities are installed on both endpoints (on the EC2 instance and the on-premises host).
- Use an EC2 instance that supports enhanced networking, and be sure that the drivers are up to date. Enhanced networking provides higher I/O with low CPU utilization, which helps avoid instance-level issues when running performance tests. If enhanced networking isn't turned on, see Enhanced networking on Linux or Enhanced networking on Windows.
- Connect to your EC2 instance to access the instances and be sure that there is end-to-end connectivity between your EC2 instance and your on-premises host.
Install the following tools to help troubleshoot and test your network:
- AWSSupport-SetupIPMonitoringFromVPC to collect network metrics such as packet loss, latency, MTR, tcptraceroute, and tracepath.
- MTR to check for ICMP or TCP packet loss and latency problems.
- Traceroute to determine latency or routing problems.
- Hping3 to determine end-to-end TCP packet loss and latency problems.
- Tcpdump to analyze packet capture samples.
Review hops on traceroute or MTR reports using a bottom-up approach. For example, check for loss on the last hop or destination, and then review the following hops. If the packet loss or latency issues continue through the last hop, there might be a network or routing issue. Packet loss or latency on only one hop in the path might occur if there's an issue with the control plane rate limiting on that node. Check if the last hop reported is the destination noted in the command. If it isn't, there might be an issue caused by a restrictive security group.
Test performance using AWSSupport-SetupIPMonitoringFromVPC
This built-in tool collects many of the metrics that you need to troubleshoot your network. For more information, see Debugging tool for network connectivity from Amazon VPC.
Performance troubleshooting for Linux instances
The Linux MTR command provides continual, updated output. This output allows you to analyze network performance. This diagnostic tool combines the functionality of traceroute and ping utilities. Most Linux distributions come with traceroute and MTR already installed. However, you can also download it from your distribution’s software package manager.
To install MTR, run the following commands:
sudo yum install mtr
sudo apt-get install mtr
To test your network's performance using MTR, run this test bidirectionally between the public IP address of your EC2 instances and your on-premises host. The path between nodes on a TCP/IP network can change if the direction is reversed. Therefore, it's important to obtain MTR results for both directions. You can use a TCP-based trace instead of ICMP, because most internet devices deprioritize ICMP-based trace requests.
Review your packet loss. Packet loss on a single hop usually doesn't indicate an issue. The loss can be the result of a control plane policy that causes the "ICMP time exceeded" messages to be dropped. If you notice sustained packet loss until the destination hop, or packet loss over several hops, this loss might indicate a problem.
Note: It's common to see a few requests time out.
mtr -n -c 200 <Public IP EC2 instance/on-premises host> --report
mtr -n -T -c 200 <Public IP EC2 instance/on-premises host> --report
The argument -T performs a TCP-based MTR, and the --report option puts MTR into report mode. MTR runs for the number of cycles specified by the -c option. Print the statistics, and then exit.
Note: There is a known issue with some versions of MTR where the final hop reports an incorrect value if TCP is used.
Test performance using traceroute
The Linux traceroute utility identifies the path taken from a client node to the destination node. The utility records the time in milliseconds for each router to respond to the request. The utility also calculates the amount of time each hop takes before reaching its destination.
To install traceroute, run the following commands:
sudo yum install traceroute
sudo apt-get install traceroute
Note: Traceroute is not necessary if you run an MTR report. MTR provides latency and packet loss statistics to a destination.
Be sure that port 22 or the port that you're testing is open in both directions. To troubleshoot network connectivity using traceroute, run the command from the client to the server, and from the server back to the client. The path between nodes on a TCP/IP network can change if the direction is reversed. Use a TCP-based trace instead (your application port) of ICMP, because most internet devices deprioritize ICMP-based trace requests.
sudo traceroute <Public IP of EC2 instance/on-premises host>
sudo traceroute -n -T -p 22 <Public IP of EC2 instance/on-premises host>
The argument -T -p 22 -n performs a TCP-based trace on port 22.
Note: You can use your application specific port for testing. Use the specific port to understand if there are any intermediate devices in the path dropping your application traffic.
Test performance using hping3
Hping3 is a command-line oriented TCP/IP packet assembler and analyzer that measures end-to-end packet loss and latency over a TCP connection. In addition to ICMP echo requests, hping3 supports TCP, UDP, and RAW-IP protocols. Hping3 also includes a traceroute mode that can send files between a covered channel. Hping3 is designed to scan hosts, assist with penetration testing, test intrusion detection systems, and send files between hosts.
MTRs and traceroute capture per-hop latency. However, hping3 yields results that show end-to-end min/avg/max latency over TCP in addition to packet loss. To install hping3, run the following commands:
sudo yum --enablerepo=epel install hping3
sudo apt-get install hping3
The following command sends 50 TCP SYN packets over port 0. By default, hping3 sends TCP headers to the target host's port 0, with a window size of 64 and without a TCP flag:
hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host>
The following command sends 50 TCP SYN packets over port 22:
hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host> -p 22
Note: Be sure that port 22 or the port that you're testing is open.
Test packet capture samples using tcpdump
It's a best practice to perform simultaneous packet captures on your EC2 instance and on-premises host when diagnosing packet loss/latency issues. Doing so can help to identify the request and response packets so that we can isolate the issue at the networking and application layers. To install tcpdump, run the following commands:
sudo yum install tcpdump
sudo apt-get install tcpdump
After tcpdump is installed, you can run the following command to capture the tcp port 22 traffic and save it in a pcap file.
sudo tcpdump -i eth0 port 22 -s0 -w samplecapture.pcap
Performance troubleshooting for Windows
Check for ECN capability
1. Run the following command to determine if Explicit Congestion Notification (ECN) capability is enabled:
netsh interface tcp show global
2. If ECN capability is activated, run the following command to deactivate it:
- netsh interface tcp set global ecncapability=disabled
3. If you don't see an improvement in performance, you can re-activate ECN capability using the following command:
netsh interface tcp set global ecncapability=enabled
Review hops and troubleshoot TCP port connectivity
First, use MTR or tracert to review hops:
1. Download and install WinMTR.
2. Enter the destination IP in the Host section, and then choose Start.
3. Let the test run for a minute, and then choose Stop.
4. Choose Copy text to clipboard and paste the output in a text file.
5. Look for any losses in the % column that are propagated to the destination.
Note: Ignore any hops with the No response from host message. This message indicates that those particular hops aren't responding to the ICMP probes.
6. Review hops on the MTR reports using a bottom-up approach. For example, check for loss on the last hop or destination, and then review the preceding hops.
If you don't want to install MTR, you can use the tracert command utility tool.
1. Perform a tracert to the destination URL or IP address.
2. Look for any hop that shows an abrupt spike in round-trip time (RTT). An abrupt spike in RTT might indicate that there's a node under high load, which in turn induces latency or packet drops in your traffic.
Then, check TCP port connectivity:
Note: Because WinMTR and tracert are both ICMP-based, you can use tracetcp to troubleshoot TCP port connectivity.
2. Extract the tracetcp ZIP file.
3. Copy tracetcp.exe to your C drive.
4. Install WinPcap.
5. Open the command prompt and root WinPcap to your C drive using the C:\Users\username>cd \ command.
6. Run tracetcp using the following commands: tracetcp.exehostname:port or tracetcp.exe ip:port.
Check the Windows Task Manager
If you have access to the source instance or destination instance, check the Windows Task Manager. Look for issues with CPU and memory utilization, or load average.
Take a packet capture
Note: It's a best practice to first start the packet capture and then initiate the traffic. This approach helps you capture all packets for the flow.
1. Install Wireshark and take a packet capture.
2. Use the following filter to isolate the traffic between particular sources in the packet capture: (ip.addr eq source_IP) &&(tcp.flags.syn == 1). The output shows all the tcp streams initiated by that source IP.
3. Select the row with the relevant source IP and destination IP.
4. Choose the context (right-click) menu, and then choose Follow, TCP Stream. This results in a TCP flow between the source IP and destination IP that you want to investigate.
5. Look for retransmissions, duplicate packets, or TCP window size notifications like TCP window full or Window size zero. These notifications might indicate that the TCP buffers are running out of space.
If you find packet loss in your network or number of hops changes (when compared with bench mark results) that are introducing additional latency, refer to your vendor documentation for instructions on how to check network devices for analysis and troubleshooting. If you are multi-homed, then perform these tests using a different Internet Service Provider (ISP), and then compare the results.