AWS Compute Blog

Powering .NET 5 with AWS Graviton2: Benchmarks

This post was authored by Kirk Davis, Developer Advocate for App Modernization 

In 2019, AWS announced new Amazon EC2 instance types powered by the AWS Graviton2 processor. The AWS Graviton2 processor is based on the ARM64 architecture leveraging 64-bit ARM Neoverse N1 cores. Since 2019, AWS has launched many new EC2 instances built on Graviton2, including general-purpose (M6g), compute-optimized (C6g), memory-optimized (R6g), and general-purpose burstable (T4g) types. These Graviton2 based instances provide up to 40% better price performance over their comparable generation x86-64 instances. These instance types use the same naming convention as other types, but with a “g” appended to the family. For example, a t4g.large, or a c6g.2xlarge. Many customers are already running workloads on these Graviton2 instances, including .NET Core applications. Note that I refer to these 64-bit processors as “x86” for this blog post.

Organizations like AnandTech have done in-depth benchmarking of Graviton2 against x86-architecture EC2 instances and found that Graviton2 has a significant performance and cost advantage. Comparing similar instance families, the Graviton2 instances are about 20% less expensive per hour than Intel x86 instances with up to 40% better performance. With .NET 5 officially released in November, I thought it would be interesting to see what advantages Graviton2 has for .NET 5 web applications as a follow-up to the .NET 5 on AWS blog AWS published earlier. Follow along this blog to learn how I ran the benchmarking tests, the applications I chose to benchmark, and to see the results.

Overview

I decided to run some straight-forward .NET 5 benchmarks that tested ASP.NET Core under load for both x86-based and Graviton2 instances. ASP.NET Core runs application code in thread-pool threads, so it takes advantage of multiple cores to handle multiple requests concurrently. One thing to keep in mind is that x86-based EC2 instance types use simultaneous multi-threading, and a vCPU maps to a logical core. However, for Graviton2 instances a vCPU maps to a physical core. So, for these benchmarks, I used x86 and ARM64 instance types with 4 x vCPUs: m5.xlarge instance types, which have four logical (two physical) x86 cores, and m6g.xlarge instances, which have four physical ARM cores. I wanted to compare the latency and requests/second performance for different scenarios, and then compare the performance adjusted for the instances’ cost per hour. I used the per-hour pricing from the us-east-2 (Ohio) Region:

m5.xlarge m6g.xlarge
Cost $0.192 $0.154
vCPU 4 4
RAM 16 16

Benchmarks and testing framework

I used the open-source Crank software to run the benchmarks and gather results. Crank abstracts away many of the messy details in running benchmarks and delivers consistent results. From the GitHub page:

“Crank is the benchmarking infrastructure used by the .NET team to run benchmarks including (but not limited to) scenarios from the TechEmpower Web Framework Benchmarks.

Crank uses a controller (crank-controller), which communicates to one or more agents (crank-agent). The agents download, compile, and run the code, then report the results back to the controller. In this case, I used three agents: one each on the instances to be tested, and one on a test-runner instance (an m5.xlarge) that ran bombardier, a common load-testing tool that is already integrated into Crank. You can also choose wrk2, or other tools if you prefer (Crank’s readme files provide examples for both). I ran all the instances in the same Availability Zone (AZ) to minimize any other sources of latency. The setup looked like this:

benchmark environment setup

Note:    In order to use Crank’s agent with the .NET 5 release version, I made minor changes to its Startup.cs class. These changes forced Crank to pull down the correct .NET 5 SDK version, and fixed an issue where it wasn’t appending the correct build parameters for arm64 when compiling code on the m6g.xlarge instance. It’s possible the Microsoft.Crank.Agent project has been updated since I used it. I also updated all projects to .NET 5.

Benchmark tests

Since many of the .NET Core workloads customers are running in AWS are ASP.NET Core websites or APIs, I focused only these types of applications. I selected the Mvc project from the ASP.NET Benchmarks GitHub repository. The controller in this project defines an “Entry” class, and then creates and returns them as List<Entry> (which gets serialized to JSON by ASP.NET Core). For the source code for these methods, please refer to the preceding GitHub links. In the project, the Crank configuration YAML file defines three scenarios (note that I used these scenarios but swapped out wrk for bombardier).

  • MvcJsonNet2k: calls JsonController’s Json2k() method (returns eight Entries)
  • MvcJsonOutput60k: calls JsonController’s JsonNk() method for 60,000 bytes
  • MvcJsonOutput2M: calls JsonController’s JsonNk() method for 221 bytes

Additionally, I created another ASP.NET Core Web API application based on the boilerplate ASP.NET Web API project and added EF Core. I did this because many ASP.NET Core applications use Entity Framework Core (EF Core), and do more computationally expensive work than only serializing JSON. To isolate the performance of the two instances, I used the in-memory provider for EF Core, and populated a DbSet with weather summaries at startup. I modified the WeatherForecastController to encrypt each WeatherForecast’s Summary property using .NET’s RSACryptoServiceProvider class, and then added another controller that queries forecasts from the DbSet, and serializes them to strings. For that method, I added an asynchronous delay (using Task.Delay) to simulate querying a relational database. To run the tests, I created a Crank configuration YAML file that defines three scenarios:

  • AsyncParallelJson100: returns 100 forecasts from EF Core serialized to string using Text.Json
  • AsyncParallelJson500: returns 500 forecasts from EF Core serialized to string using Text.Json
  • ParallelEncryptWeather100: encrypts summaries for 100 forecasts and returns the forecasts as IEnumerable<WeatherForecast>

This application uses the 5.0.0 version of the Microsoft.EntityFrameworkCore and Microsoft.EntityFrameworkCore.InMemory NuGet packages. The following is the source code for the two methods I used in the tests:

JsonSerializeController’s Get method:

[HttpGet]
public async Task<IEnumerable<string>> Get(int count = 100)
{
    List<WeatherForecast> forecasts;
    List<string> jsons = new List<string>();

    using (var context = new WeatherContext())
    {
        forecasts = context.WeatherForecasts.Take(count).ToList();
    }
    await Task.Delay(5);
    Parallel.ForEach(forecasts, x => jsons.Add(JsonSerializer.Serialize(x)));

    return jsons;
}

WeatherForecastController’s Get method:

[HttpGet]
public IEnumerable<WeatherForecast> Get(int count = 100)
{
    List<WeatherForecast> forecasts;

    using (var context = new WeatherContext())
    {
        forecasts = context.WeatherForecasts.Take(count).ToList();
    }
    UnicodeEncoding ByteConverter = new UnicodeEncoding();

    using (RSACryptoServiceProvider RSA = new RSACryptoServiceProvider())
    {
        Parallel.ForEach(forecasts, x => x.EncryptedSummary = RSAEncrypt(ByteConverter.GetBytes(x.Summary), RSA.ExportParameters(false), false));
    }
    return forecasts;
}

Note:    The RSAEncrypt method was copied from the sample code in the RSACryptoServiceProvider’s docs.

Setting up the instances

For running the benchmarks, I selected the Amazon Machine Image (AMI) for Ubuntu Server 20.04 LTS, and chose “64-bit (x86)” for the m5.xlarge and “64-bit (Arm)” for the m6g.xlarge. I gave them both 20GB of Amazon Elastic Block Store (EBS) storage, and chose a security group with port 22 open to my home IP address, so that I could SSH into them. While it’s possible to install and use .NET 5 on Amazon Linux 2 (AL2), that’s not currently a supported Linux distribution for .NET 5 on ARM, and I wanted the same distribution for both x86 and ARM64. For details on launching Graviton2 instances from the AWS Management Console, please refer to the .NET 5 on AWS blog post from November 10, 2020.

Ubuntu 20.04 is a supported release for installing .NET 5 using apt-get, but ARM architectures are not yet supported. So instead – and to use the same method on both instances – I manually installed the .NET 5 SDK using the following commands, specifying the architecture-appropriate download link for the binaries*. Instructions for manually installing are also available at the prior “installing .NET 5” link.

curl -SL -o dotnet.tar.gz <link to architecture-specific binary file*>
sudo mkdir -p /usr/share/dotnet
sudo tar -zxf dotnet.tar.gz -C /usr/share/dotnet
sudo ln -s /usr/share/dotnet/dotnet /usr/bin/dotnet
echo "export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=true" >> ~/.bash_profile

Then, I used SCP to upload the source code for my benchmarking solution to the instances, and SSH’d onto both, using two tabs in the new Windows Terminal.

*At the time this blog was written, the binaries used were:
dotnet-sdk-5.0.100-linux-arm64.tar.gz
dotnet-sdk-5.0.100-linux-x64.tar.gz

Benchmark results

Benchmark runs and units

I used Crank to perform two runs of each of the six benchmarks on each of the two instances and took the average of the two runs for each. There was minimal variation between runs. For each test, I charted the latency in microseconds (μs), with the bars for MvcJsonOutput2M and ParallelEncryptWeather100 scaled by plotting μs/100, and bars for AsyncParallelJson100 and AsyncParallelJson500 scaled with μs/10. For latency, shorter bars are better.

I also charted the performance in requests/second, and the overall value as performance/dollar, where the performance is the requests/second, and dollars is the cost/hour of the given instance type. In order to have the bars legible on the same chart, some values were scaled as shown below the chart (the same scaling was applied to all values for a given benchmark). For both raw performance and performance/price, longer bars are better.

Note that I didn’t do any specific optimization for ARM64 or x86.

Summary of results

The Graviton2 instance had lower latency across the board for the tests I ran, with the m6g.xlarge (Graviton2) instance having up to 24.7% lower latency (for MvcJsonOutput2M) than the m5.xlarge (x86-64). It’s notable that in general, the more work the test method was doing, the bigger the advantage of Graviton2.

The results were broadly similar for requests/second, with Graviton2 delivering up to 31.6% better performance (for MvcJsonOutput2M). For the most computationally-expensive test – ParallelEncryptWeather100 – the Graviton2 instance churned out 16.6% more requests per second. And all of this is without considering the price difference. Also, not reflected in the charts is that the x86 instance had twice as many bad requests (average of 16) as the Graviton2 instance (average of 8) for the ParallelEncryptWeather100 test. ParallelEncryptWeather100 was the only test where there were any bad responses across all the tests.

When scaling the performance for the hourly price of each instance type, the differences are starker. The Graviton2 offers up to 64% more requests/second per hourly cost of the instance (for MvcJsonOutput2M). Even on the test with the least advantage (MvcJsonNet2k), the Graviton2 provided 30.8% better performance/cost, where performance is requests/second. These types of results can translate into significant savings for even modestly sized workloads.

Charts

chart showing mean latency for the benchmark

In the preceding chart, the mean latency is shown in micro-seconds (μs), with the values for some tests divided by either 10 or 100 in order to make all the bars visible in the chart. The Graviton2 instance had 24.7% lower latency for the MvcJsonOutput2M test, and had lower latency across all the tests.

chart showing raw performance for the benchmark

This second chart shows how the m6g.xlarge Graviton2 instance handled more requests for every test. The bars represent the raw requests/second for each test. The values for some tests are scaled by a factor of 10 to make all the bars visible in the chart. For the MvcJsonOutput2M test, which serializes two megabytes to JSON, it handled 31.6% more requests per second, and was faster for every test I ran.

chart showing price/performance for benchmark test

This third chart uses the same performance values as the preceding one, but the m5.xlarge values are divided by its hourly cost ($0.192 in the Ohio Region), and the m6g.xlarge bars are divided by $0.154 (also for the Ohio Region). Again, some bars are scaled by a factor of 10 to make all the bars visible in the chart. The Graviton2 instance handled 64% more requests per dollar for the MvcJsonOutput2M test, and provides much better performance per dollar across all the tests.

Conclusion

If you’re adopting .NET 5 for your applications, you have a variety of choices for deploying them in AWS. You can run them in containers in Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS) with or without AWS Fargate, you can deploy them as serverless functions in AWS Lambda, or deploy them onto EC2 using either x86-based or Graviton2-based instances.

For running scalable web applications built on ASP.NET Core 5.0, the new Graviton2 instance families offer significant performance advantages, and even more compelling performance/price advantages of up to 64% over the equivalent Intel x86 instance families without making any code changes. Coupled with the ARM64 performance improvements in .NET 5, moving from .NET Core 3.1 on x86 to .NET 5 on Graviton2 promises significant cost savings. It also allows developers to code and locally test on their x86-based development machines (or even new ARM-based macOS laptops), and to use their existing deployment mechanisms. If your application is still based on .NET Framework, consider using the AWS Porting Assistant for .NET to begin porting to .NET Core.

Learn more about AWS Graviton2 based instances.