How can I troubleshoot packet loss for my Direct Connect connection?

Last updated: 2021-12-23

I'm using AWS Direct Connect to transfer data. I'm experiencing packet loss transferring data to my Amazon Elastic Compute Cloud (Amazon EC2) instance. How can I isolate the packet loss?

Short description

Packet loss occurs when transmitted data packets fail to arrive at their destination resulting in network performance issues. Packet loss is caused by low signal strength at the destination, excessive system utilization, network congestion and network route misconfigurations.

Resolution

Run the following checks for your network devices and Direct Connect connection.

Check the AWS Personal Health Dashboard for scheduled maintenance or events

The AWS Personal Health Dashboard displays relevant information and also provides notifications for activities. For more information, see How can I get notifications for Direct Connect scheduled maintenance or events?

Check metrics for the Direct Connect endpoint, customer gateway (CGW), and intermediate device (layer 1)

With CGW and intermediate devices, the issue can be local to the on premises network or the transit path towards AWS. Check the following on the on premises node and intermediate devices:

  • The CGW logs for Interface flaps.
  • CPU utilization for the CGW when the issue occurred.
  • The light signal reading on the device the Direct Connect connection terminates.
  • The device the Direct Connect connection terminates for input errors, incrementing framing errors, cyclic redundancy (CRC) errors, runts, giants and throttles.

Check Direct Connect connection metrics (layer 1)

Check for the following Direct Connect metrics:

  • ConnectionErrorCount: Apply the sum statistic and note that non-zero values indicate MAC level errors in the AWS device.
  • ConnectionLightLevelTX and ConnectionLightLevelRX: Check the light signal recorded on the Direct Connect connection when the issue occurred. The acceptable range is between -14.4 and 2.50 dBm.
  • ConnectionBpsEgress and ConnectionBpsIngress: Check the amount of traffic on the Direct Connect connection when the packet loss occurred for congestion on the link.

For more information, see Direct Connect Connection metrics.

Check for asymmetric sub optimal routing (layer 3)

Asymmetric routing can result in packet loss if the on premises firewall performs unicast reverse path forwarding causing network traffic to drop. Suboptimal routing with the on premises network can also cause packet loss.

For more information, see How can I resolve asymmetric routing issues when I create a VPN as a backup to Direct Connect in a transit gateway?

End-to-end bidirectional trace route between the on premises host and the AWS host (layer 3)

Running trace route between the hosts determines the network path taken in both directions. Trace results also determine if the routing is asymmetric, load balanced, and so on.

1.    Run the following command to install traceroute:

Linux:

 sudo yum install traceroute

Ubuntu:

sudo apt-get install traceroute

2. Then, run a command similar to the following for the ICMP traceroute:

sudo traceroute -T -p <destination Port> <IP of destination host>

Windows OS:

1.    Download WinPcap and tracetcp.

2.    Extract the Tracetcp ZIP file.

3.    Copy tracetcp.exe to your C drive.

4.    Install WinPcap.

5.    Open the command prompt and root WinPcap to your C drive using the C:\Users\username>cd \ command.

6.    Run tracetcp using the following commands: tracetcp.exe hostname:port or tracetcp.exe ip:port.

End-to-end bidirectional MTR test between the on premises host and the AWS host (layer 3)

MTR tests are similar to traceroute for allowing the discovery of each router in the network connection pathway between the hosts. MTR tests also provide information on each node in the path such as packet loss.

Check the MTR results for packet loss and network latency. A network loss percentage at a hop can indicate an issue with the router. Some service providers limit the ICMP traffic that MTR uses. To determine if the packet loss is due to rate limits, review the subsequent hops. If the subsequent hop shows a loss of 0.0%, this can indicate ICMP rate limiting.

1.  Run the following command to install MTR:

Amazon Linux/REHEL:

$ sudo yum install mtr -y

Ubuntu:

sudo apt install mtr -y

Windows OS:

Download and install WinMTR.

Note: For Windows OS, WinMTR doesn't support TCP-based MTR.

2.       For the on-premises --> AWS direction, run MTR on the on premises host (ICMP and TCP based):

$ mtr -n -c 100 <private IP of EC2> --report
$ mtr -n -T -P <EC2 instance open TCP port> -c 100 <private IP of EC2> --report

3.    For the AWS --> on-premises direction, run MTR on the EC2 instance (ICMP and TCP based):

$ mtr -n -c 100 <private IP of the local host> --report
$ mtr -n -T -P <local host open TCP port> -c 100 <private IP of the local host> --report

Review the path MTU between the on premises host and AWS host (layer 3)

The maximum transmission unit (MTU) is the size of the largest permissible packet passed over the network connection. Path MTU Discovery (PMTUD) determines the MTU path. Packet loss can occur if the packet is too large. For more information, see Path MTU Discovery.

You can check the path MTU between two hosts using tracepath.

1.  For the on-premises --> AWS direction, run tracepath on port 80 from the local host:

$ tracepath -n -p 80 <EC2 private instance IP>
2. For the AWS --> on premise direction, run tracepath on port 80 from the EC2 instance:

$ tracepath -n -p 80 <private IP of local host>