How do I troubleshoot network performance issues between EC2 Linux or Windows instances in a VPC and an on-premises host over the internet gateway?

11 minute read
0

Packet loss or latency issues exist between my Amazon Elastic Compute Cloud (Amazon EC2) instances and on-premises host over the internet gateway. How can I troubleshoot these issues with network performance?

Short description

To diagnose network issues such as packet loss or latency, first test the network to isolate the source of the issue. The following resolution can help determine if the source of the issue is a network or an application. It's a best practice to benchmark the performance results so that you can compare the results when you're observing the performance issues.

Before you begin troubleshooting, check the following:

  • Be sure that the network utilities are installed on both endpoints (on the EC2 instance and the on-premises host).
  • Use an EC2 instance that supports enhanced networking, and be sure that the drivers are up to date. Enhanced networking provides higher I/O with low CPU utilization, which helps avoid instance-level issues when running performance tests. If enhanced networking isn't turned on, see Enhanced networking on Linux or Enhanced networking on Windows.
  • Connect to your EC2 instance to access the instances and be sure that there's end-to-end connectivity between your EC2 instance and your on-premises host.

Resolution

Install the following tools to help troubleshoot and test your network:

  • AWSSupport-SetupIPMonitoringFromVPC to collect network metrics such as packet loss, latency, MTR, tcptraceroute, and tracepath.
  • MTR to check for ICMP or TCP packet loss and latency problems.
  • Traceroute to determine latency or routing problems.
  • Hping3 to determine end-to-end TCP packet loss and latency problems.
  • Tcpdump to analyze packet capture samples.

Review hops on traceroute or MTR reports using a bottom-up approach. For example, check for loss on the last hop or destination, and then review the following hops. If the packet loss or latency issues continue through the last hop, there might be a network or routing issue. Packet loss or latency on one hop in the path might occur if there's an issue with the control plane rate limiting on that node. Check if the last hop reported is the destination noted in the command. If it isn't, then there might be an issue caused by a restrictive security group.

Test performance using AWSSupport-SetupIPMonitoringFromVPC

This built-in tool collects many of the metrics that you need to troubleshoot your network. For more information, see Debugging tool for network connectivity from Amazon VPC.

Performance troubleshooting for Linux instances

Check the Linux performance statistics

If you have access to the source instance or destination instance, check for issues with the CPU, memory utilization, and load average.

Test performance using MTR

The Linux MTR command provides continual, updated output. This output allows you to analyze network performance. This diagnostic tool combines the functionality of traceroute and ping utilities. Most Linux distributions come with traceroute and MTR preinstalled. You can also download it from your distribution’s software package manager.

To install MTR, run the following commands:

Amazon Linux:

sudo yum install mtr

Ubuntu:

sudo apt-get install mtr-tiny

To test your network's performance using MTR, run this test bidirectionally between the public IP address of your EC2 instances and your on-premises host. The path between nodes on a TCP/IP network can change if the direction is reversed. Therefore, it's important to obtain MTR results for both directions. You can use a TCP-based trace instead of ICMP, because most internet devices deprioritize ICMP-based trace requests.

Review your packet loss. Packet loss on a single hop usually doesn't indicate an issue. The loss can be the result of a control plane policy that causes the "ICMP time exceeded" messages to be dropped. If you notice sustained packet loss until the destination hop, or packet loss over several hops, this loss might indicate a problem.

Note: It's common to see a few requests time out.

ICMP-based MTR:

mtr -n -c 200 <Public IP EC2 instance/on-premises host> --report

TCP-based MTR:

mtr -n -T -c 200 <Public IP EC2 instance/on-premises host> --report

The argument -T performs a TCP-based MTR, and the --report option puts MTR into report mode. MTR runs for the number of cycles specified by the -c option. Print the statistics, and then exit.

Note: The TCP-based MTR tests the destination TCP port 80, to MTR a specific destination TCP port, appended with -P, followed by the port number. The following is an example to MTR destination TCP port 443:

mtr -n -T -c 200 <Public IP EC2 instance/on-premises host> -P 443 --report

Test performance using traceroute

The Linux traceroute utility identifies the path taken from a client node to the destination node. The utility records the time in milliseconds for each router to respond to the request. The utility also calculates the amount of time that each hop takes before reaching its destination.

To install traceroute, run the following commands:

Amazon Linux:

sudo yum install traceroute

Ubuntu:

sudo apt-get update
sudo apt-get install traceroute

Note: Traceroute isn't necessary if you run an MTR report. MTR provides latency and packet loss statistics to a destination.

Be sure that port 22 or the port that you're testing is open in both directions. To troubleshoot network connectivity using traceroute, run the command from the client to the server, and from the server back to the client. The path between nodes on a TCP/IP network can change if the direction is reversed. Use a TCP-based trace instead (your application port) of ICMP, because most internet devices deprioritize ICMP-based trace requests.

ICMP-based traceroute:

sudo traceroute -I <Public IP of EC2 instance/on-premises host>

TCP-based traceroute:

sudo traceroute -n -T -p 22 <Public IP of EC2 instance/on-premises host>

The argument -T -p 22 -n performs a TCP-based trace on port 22.

Note: You can use your application specific port for testing. Use the specific port to understand if there are any intermediate devices in the path dropping your application traffic.

Test performance using hping3

Hping3 is a command-line oriented TCP/IP packet assembler and analyzer that measures end-to-end packet loss and latency over a TCP connection. In addition to ICMP echo requests, hping3 supports TCP, UDP, and RAW-IP protocols. Hping3 also includes a traceroute mode that can send files between a covered channel. Hping3 is designed to scan hosts, assist with penetration testing, test intrusion detection systems, and send files between hosts.

MTRs and traceroute capture per-hop latency. However, hping3 yields results that show end-to-end min/avg/max latency over TCP in addition to packet loss. To install hping3, run the following commands:

Amazon Linux 2. Install the EPEL release package for RHEL 7, then activate the EPEL repository.

sudo amazon-linux-extras install epel -y

Amazon Linux 2:

sudo yum --enablerepo=epel install hping3

Ubuntu:

sudo apt-get install hping3

The following command sends 50 TCP SYN packets over port 0. By default, hping3 sends TCP headers to the target host's port 0, with a window size of 64 and without a TCP flag:

sudo hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host>

The following command sends 50 TCP SYN packets over port 22:

sudo hping3 -S -c 50 -V <Public IP of EC2 instance/on-premises host> -p 22

Note: Be sure that port 22 or the port that you're testing is open.

Test packet capture samples using tcpdump

It's a best practice to perform simultaneous packet captures on your EC2 instance and on-premises host when diagnosing packet loss/latency issues. Doing so can help to identify the request and response packets so that we can isolate the issue at the networking and application layers. It's also a best practice to first start the packet capture, then initiate the traffic. This helps capture all packets for the flow. To install tcpdump, run the following commands:

Amazon Linux:

sudo yum install tcpdump

Ubuntu:

sudo apt-get install tcpdump

After tcpdump is installed, you can run the following command to capture the tcp port 22 traffic and save it in a pcap file.

sudo tcpdump -i eth0 port 22 -s0 -w samplecapture.pcap

Note: The tcpdump flag -i specifies the interface on the instance where tcpdump captures the traffic. You might need to change the interface from eth0 to the configured interface in your environment.

Performance troubleshooting for Windows

Check for ECN capability

1.    Run the following command to determine if Explicit Congestion Notification (ECN) capability is turned on:

netsh interface tcp show global

2.    If ECN capability is activated, run the following command to deactivate it:

- netsh interface tcp set global ecncapability=disabled

3.    If you don't see an improvement in performance, you can re-activate ECN capability using the following command:

netsh interface tcp set global ecncapability=enabled

Review hops and troubleshoot TCP port connectivity

First, use MTR or tracert to review hops:

MTR method:

1.    Download and install WinMTR.

2.    Enter the destination IP in the Host section, and then choose Start.

3.    Let the test run for a minute, and then choose Stop.

4.    Choose Copy text to clipboard and paste the output in a text file.

5.    Look for any losses in the % column that are propagated to the destination.

Note: Ignore any hops with the No response from host message. This message indicates that those particular hops aren't responding to the ICMP probes.

6.    Review hops on the MTR reports using a bottom-up approach. For example, check for loss on the last hop or destination, and then review the preceding hops.

Tracert method:

If you don't want to install MTR, you can use the tracert command utility tool.

1.    Perform a tracert to the destination URL or IP address.

2.    Look for any hop that shows an abrupt spike in round-trip time (RTT). An abrupt spike in RTT might indicate that there's a node under high load, which in turn induces latency or packet drops in your traffic.

Note: The -d option doesn't resolve IP addresses to hostnames. Remove -d if IP to hostname resolution is required.

tracert -d <Public IP of EC2 instance/on-premises host>

Then, check TCP port connectivity.

Note: Because WinMTR and tracert are both ICMP-based, you can use tracetcp to troubleshoot TCP port connectivity.

1.    Download WinPcap and tracetcp.

2.    Extract the tracetcp ZIP file.

3.    Copy tracetcp.exe to your C drive.

4.    Install WinPcap.

5.    Open the command prompt and root WinPcap to your C drive using the *C:\Users\username>cd * command.

6.    Run tracetcp using the following commands: tracetcp.exehostname:port or tracetcp.exe ip:port.

Check the Windows Task Manager

If you have access to the source instance or destination instance, check the Windows Task Manager. Look for issues with CPU and memory utilization, or load average.

Take a packet capture

Note: It's a best practice to perform simultaneous packet captures on your EC2 instance and your on-premises host when diagnosing packet loss or latency issues. This helps to identify the request and response packets to isolate the issue at the networking and application layers. It's also a best practice to first start the packet capture and then initiate the traffic. This helps capture all packets for the flow.

1.    Install Wireshark and take a packet capture.

2.    Use the following filter to isolate the traffic between particular sources in the packet capture: (ip.addr eq source_IP) &&(tcp.flags.syn == 1). The output shows all the tcp streams initiated by that source IP.

3.    Select the row with the relevant source IP and destination IP.

4.    Choose the context (right-click) menu, and then choose Follow, TCP Stream. This results in a TCP flow between the source IP and destination IP that you want to investigate.

5.    Look for retransmissions, duplicate packets, or TCP window size notifications like TCP window full or Window size zero. These notifications might indicate that the TCP buffers are running out of space.

If you find packet loss, or if the number of hops changes significantly from your benchmarks, refer to your networking equipment vendor documentation. If working within a multi-homed network environment, perform these tests using a different ISP.


Related information

Enhanced networking on Linux

Enhanced networking on Windows

AWS OFFICIAL
AWS OFFICIALUpdated a year ago