AWS Developer Tools Blog

Announcing preview release for the generational mode to the Shenandoah GC

The Amazon Corretto team is excited to announce the preview release for the generational mode to the Shenandoah GC. This is a result of our collaboration with Red Hat on a significant GC contribution: the addition of a generational mode to traditional single generation Shenandoah. One of the primary advantages of Java is that the Java Virtual Machine (JVM) automatically handles memory management. Many innovations have resulted from efforts to ensure that application throughput and response time are minimally impacted by the JVM. Recent memory managers such as the Shenandoah and ZGC garbage collectors (GCs) are representative of the state of the art of automatic memory management.

What are the benefits?

By adding a generational mode, the Amazon Corretto team delivers the benefits of Shenandoah to a broader audience of Java developers who wish to build applications with high memory allocation rates (in excess of 4 GB/s) and/or high live memory utilization (in excess of 60%). With certain workloads, Shenandoah’s new generational mode can match traditional Shenandoah response times using one third the heap size and can be configured by the customer to deliver maximum GC pause latencies below 10 ms. With similar hardware configurations, compared to traditional Shenandoah, generational mode reduces hardware costs and enables a higher percentile compliance with aggressive response time SLAs.

In this preview release, Shenandoah generational mode has demonstrated improvement on a selection of benchmarks from the Dacapo benchmark suite. It

  • Closes the gap between the memory efficiencies of G1 and the short pause times of single generation Shenandoah
  • Allows Shenandoah to maintain p99 pause times below 10 ms and better heap utilization.
  • Enables sustained higher allocation rates for short lived objects compared to single generation Shenandoah.
  • Decreases the risk of incurring stop-the-world application pauses during allocation spikes.
  • Incurs a less than 5% reduction in overall application throughput (i.e., additional application overhead) compared to single generation Shenandoah.
  • Maintains support for compressed object pointers.
  • Supports x64 and ARM64 architectures.

We are working on generalizing these benefits to a broader set of workloads, and eventually to 32-bit x86 and ARM architectures.

How does it work?

Shenandoah is a mostly concurrent garbage collector developed at Red Hat and originally released in OpenJDK 12. Shenandoah achieves p99 pause times under 10ms by collecting unused memory while application threads are running, racing them to reclaim memory before they exhaust it. Shenandoah tries to avoid losing the race, but if it does, all application threads are paused until it finishes.

There are a few ways to help Shenandoah win the race. You can give it more threads (-XX:ConcGCThreads), though doing so will reduce application throughput by devoting more machine resources to GC. You can give it a head start by adjusting heuristics so it runs more aggressively, though again, that costs throughput. Or, you can give it more memory to make sure application threads do not fill the heap before it finishes. If none of these options appeal, you now have another: a young generation for Shenandoah.

Separating garbage collection across multiple (typically only two) generations reduces the amount of work done during each collection cycle. This technique has been used by all JVM collectors with, until recently, the exception of Shenandoah and ZGC. Traditional Shenandoah collection cycles cover the entire heap in order to maximize the amount of reclaimed unused memory. That is, the heap consists of a single generation. But, most newly allocated objects quickly become unreachable, so allocating them in a separate heap area, the young generation, and focusing collection efforts there, yields the most free memory for the least effort. The goal is not to reduce pause times (these are already very short), but to reduce concurrent cycle times, during which both collection and allocation of objects occur. Focusing on the young generation shortens the race for the garbage collector and helps avoid long application pauses.

Developers burned by long GC-related pauses in the Parallel and G1 collectors are probably wondering about the old generation. Young objects which survive a configurable number of young collections are copied into an old generation which is infrequently collected relative to the young generation. Based on performance heuristics, Shenandoah generational mode will initiate an old generation collection before old generation memory is exhausted. It collects the old generation concurrently while running both the application and the young collector. Interruptible concurrent old collection that allows young collections to take precedence ensures that an old collection will not cause Shenandoah generational mode to lose the race with the application.

Results

Shenandoah generational mode shows promising results on a selection of benchmarks from the Dacapo suite, which is designed to represent “real world” workloads, though not all of the benchmarks stress garbage collection enough to demonstrate real collector differences. They were run with -Xmx and -Xms equal to 8GB and default arguments for all collectors. Data was collected from our CI/CD pipelines over a two week period, comprising approximately 420 executions run on x86 and aarch64 Linux large build instances, stored in AWS OpenSearch, and rendered using Kibana’s vega-lite integration.

The image below shows a table of “box plot” charts. Each box plot (turned on its side) represents a measurement distribution. The “whiskers” are the p5 and p95 values, and the edges of the “box” are p25 and p75 values. The line in the middle is p50. Each row is a different benchmark from the Dacapo suite. Each column is a metric from the benchmark. Lower is better for all of them. The Elapsed Time column is the elapsed time for the benchmark. Again, lower is better. Max Pause is the maximum observed pause time as measured by the jHiccup tool. The “Max RSS” column is the highest observed value of the Resident Set Size (RSS) for the process during the benchmark run.

To make it concrete, take a look at the pause times for the batik benchmark in the top graph. You can read this as: over the past two weeks, the worst (p95) pause time we observed for G1 across all these executions was ~750ms (!), but its median pause time (p50) was around 575ms. To be fair, you can also plainly see that G1 usually uses less memory than the other collectors and generally does well on the benchmark score. Another example: the maximum RSS required for Shenandoah generational mode on the xalan benchmark is less than half what is required for single generation mode with no appreciable difference in pause time or benchmark score.

Here are results from another benchmark: HyperAlloc, which is part of our open source benchmark suite Heapothesys. This chart shows a distribution of pause times for an 8GB heap holding 1GB of live objects with allocation rates of 2GB/s and 3GB/s. You can see that generational mode has lower pause times than single generation mode.

How do I use it?

Links to download executable binaries for Linux x86 and aarch64 hosts are available on the Shenandoah generational mode read-me page.

To activate the generational feature, change Shenandoah’s mode with the following options on the java command line.

-XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions
-XX:ShenandoahGCMode=generational

There are, of course, more generational mode command line options, but their description is beyond the scope of this article. You may view them by running with -XX:+PrintFlagsFinal. Look for (or grep for) “Shenandoah”. For the reasons described earlier, generational mode may require a larger young generation than does your current application. You can adjust young generation size using -XX:NewRatio or, more directly with -XX:NewSize/-Xmn. Generational mode also understands -XX:InitialTenuringThreshold, which is used to control how many collection cycles an object must survive before being copied into the old generation. Planned enhancements include heuristics to dynamically adjust young generation size. For now, it is fixed at startup.

We are releasing Generation Shenandoah as a preview. However, we want to help our customers bring it to production, so we would love to work with you do so. Please reach out to us via a GitHub ticket: we will get back to you promptly.

Metrics

Shenandoah generational mode includes new and detailed metrics that provide insight into garbage collector execution. Instead of treating a collection as a single event, the metrics expose the different collector phases, and whether they execute concurrently. Additional information includes application allocation rates and the percentage of total wall clock time that the collector runs concurrently.

All metrics are published via Java Management Extensions (JMX) through the GarbageCollectorMXBean. You can use tools such as JConsole to retrieve them, or directly access them using the JMX apis.

Example of JConsole displaying detailed information about a GC pause phase

In the above example, a reported pause lasts ~1.3 milliseconds (1,306,273 nanoseconds). Such consistently short pauses enable a large set of latency sensitive applications to be written in Java.

The GarbageCollectorMXBean and GcInfo javadoc provide more detail and a complete list of the available metrics and their meaning.

Where can I learn more and how can I get involved?

Shenandoah generational mode is a work in progress. While we are very pleased to see its benefits on important workloads, there are several areas that need improvement.

  1. Heuristics determine when to start young and old generation collections. In the current implementation, we have observed unwanted collection triggering lag that allows object allocation to deplete the free pool before the collector has replenished it.
  2. Application pacing can ensure that during a collection cycle, the available allocation pool is consumed at a pace no faster than the pace at which the collector makes progress. The pacing implementation needs to take into consideration the special needs of generational collection.
  3. Various performance enhancements are under consideration. Development priority will be based on early user feedback.

We look forward to hearing from you, our customers, to help us refine our future road map. We will continue to invest in Corretto and OpenJDK to improve Java virtual machine performance and drive Java innovation.

The Corretto team appreciates any feedback and questions about Corretto and Shenandoah generational mode. Our branch in the corretto-17 repository is ‘generational-shenandoah’. We also push to the default branch of the OpenJDK Shenandoah project repo. That branch is closer to the OpenJDK tip repo than the corretto-17 branch.

Please use GitHub issues for our repo to report problems and request features. Pull requests are welcome.

It has been a terrific and fruitful collaboration with the RedHat engineers. Please see the Shenandoah project contributors at https://github.com/openjdk/shenandoah.