Skip to content

Conversation

kdnilsen
Copy link
Contributor

@kdnilsen kdnilsen commented Oct 2, 2025

This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition.

This addresses a problem that results if available memory is probed while we are rebuilding the freeset.

Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8369048: GenShen: Defer ShenFreeSet::available() during rebuild (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27612/head:pull/27612
$ git checkout pull/27612

Update a local copy of the PR:
$ git checkout pull/27612
$ git pull https://git.openjdk.org/jdk.git pull/27612/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27612

View PR using the GUI difftool:
$ git pr show -t 27612

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27612.diff

Using Webrev

Link to Webrev Comment

@kdnilsen kdnilsen marked this pull request as draft October 2, 2025 17:58
@kdnilsen
Copy link
Contributor Author

kdnilsen commented Oct 2, 2025

Will identify this PR as draft until I complete performance and correctness tests.

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 2, 2025

👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 2, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@kdnilsen kdnilsen changed the title 8369048: GenShen: Defer ShenFreeSet::available() during rebuildAdd support for freeset rebuild lock 8369048: GenShen: Defer ShenFreeSet::available() during rebuild Oct 2, 2025
@openjdk
Copy link

openjdk bot commented Oct 2, 2025

@kdnilsen The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@kdnilsen
Copy link
Contributor Author

kdnilsen commented Oct 6, 2025

I have these results from running Extremem tests on commit 99d0175

image

I am going to try an experiment with a different approach. I will remove the synchronization lock and instead will cause the implementation of freeset rebuild to not update available() until after it is done with its work. I think this may address the same problem with less run-time overhead.

@kdnilsen
Copy link
Contributor Author

kdnilsen commented Oct 7, 2025

On the same workload, here are the results of the experiment (rather than locking to prevent fetch of available during rebuild, we continue to return the value of available at the start of rebuild until rebuild finishes):

image

General observations are that:

  1. CPU utilization increased for both GenShen and Shen.
  2. Number of completed GCs increased for GenShen but decreased for Shen
  3. Shen degenerated GCs increased
  4. GenShen P50 latency increased, but p95, p99, and p99.9 latencies decreased. Higher latencies all increased for GenShen.
  5. Shen latencies are worse at all percentiles.

Qualitatively, what would we expect? If we return an old value for available() during freeset rebuild, we are usually causing triggering heuristics to believe there is less memory available than is actually available. This may cause us to trigger GC more aggressively. This bears out for GenShen, but not for Shen.

With GenShen, the critical conflict occurs when old marking has completed, and we rebuild the free set following old marking in order to recycle immediate old garbage and to set aside old-collector reserves which will be required for anticipated mixed evacuation GC cycles that will immediately follow. While this is happening, the Shenandoah regulator thread is trying to decide whether it should interrupt old GC in order to perform an "urgent" young GC cycle. And sometimes, the regulator thread's inquiry as to how much memory is available sees a bogus (not just stale, but out of thin air) value because the freeset is under construction at the time of its inquiry. Preventing this bogus value is the point of this PR.

This situation does not generally happen with traditional Shenandoah. Traditional Shenandoah only queries the available() during times when GC is idle. (There are plans to change this, to allow the freeset to be rebuilt more asynchronously, so we are testing this coordination mechanism out for both GenShen and Shen.). A plausible explanation for the observed impact on Shen is that the absence of synchronization allows Shen to see more stale values of available(), even when we are not conflicting with concurrent freeset rebuilds. Specifically, if we gnawing away on available memory, probing available() every ms, the triggering heuristic may see the same value of available() for three consecutive probes. Not recognizing that memory has been consumed, it will delay triggering of the next GC cycle, resulting in fewer concurrent GCs with the "unsynchronized" solution. Besides resulting in fewer GC cycles, the late triggers also allow us to get closer to total depletion of the allocatable memory pool, which explains an increase in Shenandoah degenerated cycles.

Presumably, GenShen is also vulnerable to this possibility. But the benefit of eliminating out-of-thin-air available values for GenShen seems to outweigh the risk of occasional stale values that cause late triggers.

@kdnilsen
Copy link
Contributor Author

kdnilsen commented Oct 7, 2025

For further context, here are CI pipeline performance summaries for the initial synchronized solution:

   Control: openjdk-master-aarch64
Experiment: synchronize-available-with-rebuild-gh-aarch64

Genshen
-------------------------------------------------------------------------------------------------------
+45.80% specjbb2015/trigger_failure p=0.00542
  Control:    365.562   (+/-158.45  )        109
  Test:       533.000   (+/-200.37  )         10

+28.53% scimark.lu.large/concurrent_update_refs_young p=0.00020
  Control:      5.608ms (+/-  1.91ms)         34
  Test:         7.208ms (+/-107.48us)          2

+24.44% specjbb2015/concurrent_update_refs_degen_young p=0.00563
  Control:    804.287ms (+/-330.68ms)         41
  Test:         1.001s  (+/-101.83ms)          8

and for the "unsynchronized" solution:

   Control: openjdk-master-aarch64
Experiment: synchronize-available-with-rebuild-gh-aarch64

Genshen
-------------------------------------------------------------------------------------------------------
+51.82% hyperalloc_a2048_o4096/finish_mark_degen_young p=0.00771
  Control:     82.769ms (+/- 66.46ms)         66
  Test:       125.658ms (+/- 78.91ms)         43

The p values for all of these measures are a bit high, based on limited samples of relevant data. The unsynchronized data result is combined with previous measurements taken from the synchronized experiments.

@kdnilsen
Copy link
Contributor Author

kdnilsen commented Oct 7, 2025

One other somewhat subjective observation is that the synchronized solution experienced many more "timeout" failures on the CI pipeline than the unsynchronized solution. These timeout failures correlate with stress workloads that exercise the JVM in abnormal/extreme ways. Under these stresses, the unsynchronized mechanism seems to be a bit more robust.

@kdnilsen
Copy link
Contributor Author

kdnilsen commented Oct 7, 2025

I'm inclined to prefer the synchronized solution so will revert my most recent three commits.

@kdnilsen kdnilsen marked this pull request as ready for review October 7, 2025 15:19
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 7, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 7, 2025

Webrevs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant