Lower access latency for your apps with AWS Wavelength and our telco partners

Our Global Infrastructure spans 108 Availability Zones (AZs) within 34 geographic AWS Regions around the world at the time of writing. To provide mobile apps with ultra-low latency access to the AWS cloud, we collaborate with telecom operators globally to offer AWS Wavelength. AWS Wavelength Zones integrate AWS compute and storage services within mobile networks. This post shows how to assess the latency difference between accessing cloud resources in a Wavelength Zone versus a Region.

What is latency, and how can a Wavelength Zone help?

Latency is the delay between two events, commonly a request and a response. It has many causes, including the time taken for the request to be processed, any time the request spends in a queue before being processed, and, in distributed systems, the time taken for the messages to be sent across the network. The latter, network latency, arises partly from the packet processing that occurs as switches and routers receive and forward packets, and partly from the physical distance that packets travel.

Network latency is typically measured as round-trip, or two-way latency. This includes both the time for the request message to travel through the network and the time for the response message to travel back. Excessive network latency can result in video freezing, dropped or garbled audio, or lag in online multiplayer games, all of which can be perceived as poor quality by users.

The following figure shows a simple schematic of a mobile device, or user equipment (UE), connected to an operator’s radio access network (RAN). The RAN, in turn, connects to a packet core via the backhaul and transmission network. The packet core is responsible for authenticating and authorizing the UE, managing its mobility, and connecting to the wider internet.

Figure 1: Mobile device connectivity to AWS Region and Wavelength Zone

The picture is simplified, especially in terms of the packet core’s components, its distribution across physical locations, and the variants it comes in (EPC for 4G or 5GC for 5G). This distributed design is for two main reasons: first, resiliency, so that problems at one individual site do not affect overall system availability, which is similar to the way that AWS creates fault isolation boundaries between AZs. Second, performance, so that user experience is good regardless of where they are in the operator’s territory.

When AWS and an operator partner to build a Wavelength Zone, AWS compute and storage services are embedded within telco providers’ data centers at the edge of the 5G network. This placement aims to minimize network latency by reducing the physical distance and number of network hops that the packets traverse.

Evaluating Wavelength Zones for your needs

There are many factors that can affect the round-trip latency between the app on the UE and its AWS resources in the Region or Wavelength Zone. Some examples of these are: the UE’s physical location, the RAN and backhaul network design, the mobile core site locations, the packet core technology (4G or 5G), and the locations of the AZs that make up the closest Region.

One of the strongest benefits of the AWS cloud is that common APIs and services are available whether running in a Region, a Wavelength Zone, or other types of AWS compute infrastructure. This gives developers a “write once, run anywhere” experience. However, even taking this into account, moving resources from a Region to a Wavelength Zone takes some development and operational effort. In addition, prices for Amazon Elastic Compute Cloud (Amazon EC2) instances and other AWS resources in Wavelength Zones may differ from prices than in the parent Region. Therefore, it is logical to evaluate whether the reduction in round-trip latency justifies these incremental investments.

We can make estimates of the contribution to network latency due to physical distance. Light in a fiber-optic cable, or an electrical signal in copper, travels at about 70% of the speed of light in a vacuum. This gives a rough estimate of one millisecond of round-trip latency for every 100km (~63 miles) of physical distance between two points. However, the network path between two locations does not necessarily follow a straight line, and may indeed be much more indirect than that. Moreover, the packet handling delay is typically the greater fraction of overall network latency, and thus cannot so easily be estimated. Therefore, we need the ability to measure the round-trip latency between a UE and a Region or Wavelength Zone.

There are several ways we can measure round-trip latency between two network devices, from an ICMP ping, to the use of benchmarking tools like iperf, netperf and others. In this case we are most interested in the difference between the latency from UE to the Region and the latency from UE to the Wavelength Zone. This means that the exact method is not all that important, because systematic biases due to the measurement technique should be removed when we subtract the two values. However, it is generally accepted that a user space-to-user space test using Transmission Control Protocol (TCP) is most representative of real-world application behavior, and so here we will use netperf’s TCP Request/Response (TCP_RR) test.

As documented in the netperf manual, a TCP_RR test works as follows. After opening a TCP connection, a timer is started and a single byte is sent over TCP from the client to the server. On receiving the single byte, the server responds immediately, also with a single byte. When the client receives this response from the server, it counts that as one complete transaction, and repeats. After some interval, the total time is divided by the number of transactions to give an average round-trip time for each transaction.

Overall test setup

As described previously, to run a test we will need netperf clients and servers. Our scenario involves running the netperf client on a UE, along with two netperf servers. One server will be located on an EC2 instance in a public subnet within a Region, while the other server will be on an EC2 instance in a Wavelength Zone with a carrier IP address.

In order to build the infrastructure, you need an AWS account that is opted in to the Wavelength Zones you want to test, a process that is described here in the Wavelength documentation. You can download the AWS CloudFormation template from the aws-samples GitHub, here. You must launch the stack in the appropriate parent Region for the Wavelength Zones you want to test. You can find the full list of Wavelength Zone locations and their associated parent Regions here.

In addition, you need a UE that can run a netperf client, and that has a SIM issued by the operator running the Wavelength Zone that you want to test (or an MVNO that uses this underlying network). This is because the EC2 instance running the netperf server in the Wavelength Zone is only accessible to devices on the carrier’s mobile network, and not to the wider internet.

Infrastructure creation

First, we must create the network and compute resources we can use, by creating a CloudFormation stack.

1. Open the AWS CloudFormation console in the Region that is the parent to the Wavelength Zones you want to test.
2. Choose Create Stack > With new resources (standard).
3. Under Specify Template, select Upload a template file.
4. Select Choose file and locate the CloudFormation template that you downloaded from the aws-samples GitHub.
5. Select Next.

Figure 2: CloudFormation stack details page

6. On the stack details page, enter a name for the stack.
7. In the NPCarrierZone field, enter the Wavelength Zone ID of the Wavelength Zone you want to test. You can find the IDs on the Amazon EC2 dashboard as shown in the following figure. Be sure to use the Zone ID, and not the Zone name.

Figure 3: Zone IDs for Regional AZs and Wavelength Zones in the Europe (London), or eu-west-2, Region

8. No other fields on the stack details page need to be changed, so choose Next.
9. Accept the default stack options on the next page by choosing Next.
10. On the Review page, acknowledge the capabilities that the stack needs and choose Submit.

Creation of the resources now begins, and it typically takes two to three minutes. Once the stack’s status reaches CREATE_COMPLETE, you can continue to the next step.

Running a test

In the aws-samples/wavelength-latency-benchmarking repository there is source code for an iOS app that is a lightweight wrapper around the traditional netperf client. We use that to run our tests in this section.

To build the iOS app locally, so that you can run it on an iPhone or iPad with the appropriate operator SIM, use Apple’s Xcode IDE. Start by cloning the repository https://github.com/aws-samples/wavelength-latency-benchmarking (Source Control > Clone …). Select a location to save the local copy. Now you can select a device from the IDE top bar, and choose Product > Run to build, install and run the app.

Upon starting up, the app looks like Figure 4(a).

Figure 4: Netperf client output at various stages of testing

We need the IPv4 addresses of the two servers that were created in the previous step. If you navigate to the Amazon EC2 console, and choose Instances > Instances from the left panel, then you should see the instances named Netperf instance and Netperf WLZ instance. The IP address of the former is found in the regular Public IPv4 Address field. The IP address of the latter, which is only accessible from the carrier network, is found in the Auto-assigned IP address field, labelled as a Carrier IP address. Enter these addresses in the two fields in the app (it doesn’t matter which way round). At this point, also make sure that your UE is only connected to the mobile network, and not to any local Wi-Fi network, because the Carrier IP address is not accessible otherwise.

Now we can choose Test server 1 and Test server 2 to check that we can contact the netperf servers that are running on our EC2 instances. The initial, or control, connection, between netperf client and server is made on TCP port 12865, and this port is already open in the instances’ security groups. The test phase allows us to make sure that the IP addresses have been correctly entered and basic connectivity is working. If everything is working correctly, then once this step is done, the app should look like figure 4(b).

Finally, we can run our tests by selecting Start tests. This runs a test for a duration of 60 seconds, first to server 1 and then to server 2. If the Continuous toggle is on when the test to server 2 is complete, then the cycle repeats. The output looks like figure 4(c).

The average round-trip latency for the most recent test to each server is shown immediately beneath the Test server button, along with the number of request/response transactions over which the average was calculated. In addition, the records for each test are output in the Console output control in a comma-separated value format as follows:

IP address of server, Zone ID reported by server, Average round-trip latency, Number of transactions over which average calculated, Latitude of UE, Longitude of UE, Date and time in ISO8601 format

Lines beginning with a ‘#’ character are informational and do not follow this format. If you long press on the Console output control, then its contents are copied into the iOS clipboard and can then be processed further, such as for graphing purposes, as in the following case studies.

If you would like to run a test independently of the iOS app, then you can use the following command-line options for a netperf client to run the same type of test:

netperf -v 2 -4 -H <ipv4-address> -l 60 -t TCP_RR -- -P 12866

This can be run directly on the UE, if your UE supports it. Alternatively, you can enable the ‘Personal Hotspot’ functionality of the UE to share the cellular connection over Wi-Fi, and run the netperf client on an attached system.

In turn, the command line options tell the client to do the following:

1. Give verbose output, which includes the average round-trip latency per transaction.

2. Only use IPv4 to connect to the server (only IPv4 is supported by the carrier gateway between the operator network and the Wavelength Zone).

3. Connect to this specific host (make sure to substitute the appropriate value for the IPv4 address here).

4. Run the test for sixty seconds.

5. Run a TCP_RR test.

6. Tell the TCP_RR test, specifically, to use TCP port 12866 for the data connection.

You should see output of the following form:

[...]
Alignment      Offset         RoundTrip  Trans    Throughput
Local  Remote  Local  Remote  Latency    Rate     10^6bits/s
Send   Recv    Send   Recv    usec/Tran  per sec  Outbound   Inbound
    8      0       0      0   47610.166    21.004 0.000      0.000

Here, the average round-trip latency per transaction is given in microseconds. The figure here is 47610.166µs, or 47.61ms.

Note that sharing the cellular connection through Wi-Fi can add significant and potentially variable latency. Therefore, do not compare tests using this methodology against tests running the netperf client directly on a UE.

Case Studies

Wavelength Zones in Manchester, UK
At the time of writing, Manchester is the only city in the world with two Wavelength Zones from different mobile operators, which makes for an interesting case study. The parent Region for the Manchester Wavelength Zones is London (eu-west-2).

Manchester is around 250km from London as the crow flies, or an expected minimum of 2.5ms of round-trip latency purely due to signal propagation, using our rule of thumb. Therefore, for UEs physically located in or near Manchester, we would expect to be able to measure a meaningful difference in latency when we compare the nearby Wavelength Zones to the London Region.

Figure 5

For operator 1, on average we can see roughly 8ms reduction in round-trip latency to the Wavelength Zone, as compared to the Region. This conforms to our expectation that we should see some reduction due to a shorter signal path, and some reduction due to less packet handling. We also observe some general correlated variation in latency. This may be due to changing cell loading in the RAN or a similar cause. For operator 2, the round-trip times to the Region and to the Wavelength Zone exhibit less variability, but there is also a smaller difference in round-trip latency between the two. Which of these is preferable depends on the nature of a specific application.

Wavelength Zone in Berlin, Germany
The Wavelength Zone in Berlin is over 400km from its parent Region of Frankfurt (eu-central-1), and so we expect to see a larger difference in latency in this case.

Figure 6

Again, our expectation is confirmed, with round-trip latencies to the local Wavelength Zone that are, on average, about 12ms less than the round-trip to the Region.

Wavelength Zone in Dallas, Texas, USA
Our final case study looks at the Wavelength Zone in Dallas, Texas, which is around 1800km from its parent Region of US East (N. Virginia).

Figure 7

Here, we can observe that the round-trip latency to the local Wavelength Zone is around half that to the Region.

Each data point on the preceding graphs is an average over more than 1000 request/response transactions. Even so, to make proper comparisons, tests should be run multiple times under varying conditions that reproduce those expected to be found by real users.

Troubleshooting and cost considerations

The netperf server and client (in particular) were not designed to be long running processes and under various circumstances they print an error message and call exit(). For the iOS app, the most common circumstances are trapped and result in an output in the Console output control. However, rare cases may still cause a sudden termination of the app.

If the netperf server process on one of the EC2 instances exits due to some condition, then the client cannot connect. You can diagnose this by connecting to the instance using Session Manager, from the Amazon EC2 console. If you run ps awux | grep netserver, then you should see output like the following:

root       21702  0.0  0.0   3776  1188 ?        S    Jan10   0:01 ./netserver -D -N -f -z euw2-az2
ssm-user   64998  0.0  0.0 222180  2132 pts/0    S+   14:59   0:00 grep netserver

If the first line is not present, then the netperf server has exited. You can restart it as follows:

sudo su ec2-user;
IMDS=169.254.169.254;
TOKEN=`curl -sX PUT "http://${IMDS}/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 20"`;
AZID=`curl -sH "X-aws-ec2-metadata-token: ${TOKEN}"  http://${IMDS}/latest/meta-data/placement/availability-zone-id`;
/home/ec2-user/netperf/src/netserver -D -N -f -z “${AZID}” &

Because the netperf TCP_RR test only sends single byte messages, it does not consume large amounts of bandwidth, and thus does not significantly deplete data plans.

Other tests and cleaning up

Wavelength Zones are not the only AWS infrastructure that can be used when low latency is important. Using a similar methodology, latency comparisons for AWS Local Zones and AWS Outposts can be made. Equally, we should note that latency is not the only variable that can affect end-user experience. Throughput, jitter, and packet loss ratio, among others, are also important considerations.

When you have concluded your testing, navigate again to the CloudFormation console, and delete the stack that you created in the section Infrastructure creation.

Conclusion

In this post, I showed how you can evaluate the reduction in latency that Wavelength Zones offer for your specific circumstances.

You can get started by checking out the list of Wavelength partners and locations. Follow the Opt-in & Get Started link that is most relevant to you. Furthermore, you can clone the aws-samples repository to build your own iOS netperf client. Happy benchmarking!

Thanks

Thanks to Sigit Priyanggoro and Young Jung for technical review of this post, to Stefano Vozza and Ozkan Can for code review, and to Sammpath Chakravaruthy for the results in the Dallas, Texas case study.

AWS for Industries