-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8369048: GenShen: Defer ShenFreeSet::available() during rebuild #27612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8369048: GenShen: Defer ShenFreeSet::available() during rebuild #27612
Conversation
|
Will identify this PR as draft until I complete performance and correctness tests. |
|
👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into |
|
@kdnilsen This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 30 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
I have these results from running Extremem tests on commit 99d0175
I am going to try an experiment with a different approach. I will remove the synchronization lock and instead will cause the implementation of freeset rebuild to not update available() until after it is done with its work. I think this may address the same problem with less run-time overhead. |
|
For further context, here are CI pipeline performance summaries for the initial synchronized solution: and for the "unsynchronized" solution: The p values for all of these measures are a bit high, based on limited samples of relevant data. The unsynchronized data result is combined with previous measurements taken from the synchronized experiments. |
|
One other somewhat subjective observation is that the synchronized solution experienced many more "timeout" failures on the CI pipeline than the unsynchronized solution. These timeout failures correlate with stress workloads that exercise the JVM in abnormal/extreme ways. Under these stresses, the unsynchronized mechanism seems to be a bit more robust. |
|
I'm inclined to prefer the synchronized solution so will revert my most recent three commits. |
earthling-amzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the general issue is that available is not accurate when the freeset is being rebuilt. There are three solutions tested:
- Existing code, returns a sentinel value (
SIZE_MAX) during freeset rebuilt - Return the last known value of
availableduring the rebuild (this appears to cause more aggressive heuristics) - Block threads that call
availableduring rebuild
As it stands, only the regulator thread (which is evaluating heuristics for genshen) will call available during a free set rebuild (though this may change in the future). With the first solution, it seems we would have the heuristics believe there is much more memory available than there actually is. This would risk the heuristic not triggering when it should?
It makes sense that option 2 would trigger more GCs than option 1, but it seems the risk of triggering too late would be lower here. Option 3 might also delay triggering, but at least the heuristic would base the trigger decision on an accurate accounting of available memory.
If we go with the third option, I think we should move the lock management into the freeset and not have to change existing callers.
void ShenandoahFreeSet::prepare_to_rebuild(...) {
_lock.lock();
// do preparation
// ...
}
void ShenandoahFreeSet::finish_rebuild(...) {
// finish rebuild
// ...
_lock.unlock();
}Could we also now remove the sentinel value (FreeSetUnderConstruction)?
ysramakrishna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. (In particular of what looks like a deadlock possibility in debug builds.)
| ShenandoahHeap* const _heap; | ||
| ShenandoahRegionPartitions _partitions; | ||
|
|
||
| // This locks the rebuild process (in combination with the global heap lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain the role of this & the global heap lock vis-a-vis the rebuild process.
Also may be call it _rebuild_lock, rather than just _lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am changing the name. I will add discussion of the rank ordering of locks here as well.
| ShenandoahFreeSet(ShenandoahHeap* heap, size_t max_regions); | ||
|
|
||
|
|
||
| ShenandoahRebuildLock* lock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebuild_lock() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Making this change.
|
|
||
| size_t young_cset_regions, old_cset_regions, first_old, last_old, num_old; | ||
| ShenandoahFreeSet* free_set = heap->free_set(); | ||
| ShenandoahRebuildLocker rebuild_locker(free_set->lock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you not create a scope around lines 1158 to line 1167, since you don't want to hold the rebuild lock as soon as the rebuild is done (i.e. immediately following finish_rebuild())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it doesn't matter, since no one else is running during a full gc who needs to query available()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll tighten up the context for the rebuild lock. I was thinking that set_mark_incomplete() and clear_cancelled_gc() would be "fast enough" that it wouldn't matter to hold the rebuild_lock this much longer, but I agree it is better to release the lock as soon as possible.
| // Return available_in assuming caller does not hold the heap lock. In production builds, available is | ||
| // returned without acquiring the lock. In debug builds, the global heap lock is acquired in order to | ||
| // enforce a consistency assert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the comment be simplified to:
// Return bytes `available` in the given `partition`
// while holding the `rebuild_lock`.
Don't say anything about the heap lock in the API comment. Rather, in the part that is ifdef ASSERT where you take the heap lock (line ~244), say:
// Acquire the heap lock to get a consistent
// snapshot to check assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I write this, I realize that in the most general case where two threads may call these API's independently in a fastdebug build, you could theoretically get into a deadlock because they attempted to acquire locks in different orders (this possibility exists -- statically -- only in the fastdebug builds).
The general MuteLocker machinery has ranked mutexes to avoid such situations through static ranking and checks while acquiring locks (in debug builds as a way of potentially catching such situations and flagging them).
With such ranking though this code would assert because the locks are acquired in different order between here and elsewhere.
In product builds you are fine because the rebuild lock acts as a "leaf lock" (in hotspot parlance). But there seems to be a definite possibility of deadlock in debug builds if/when the rebuild is attempted by one thread while another checks available and attempts to acquire the heap lock to check the assertion. You could solve it by acquiring the heap lock before calling the work method where the assertion check is done.
However, I'd be much more comfortable if we used some form of lock rank framework, unless it was utterly impossible to do so for some reason. (Here it was easy to spot the lock order inversion because it was in the code. Of course, if a debug build deadlocked you would also figure out the same, but having lock ordering gives you a quick and easy way to verify if there's potential for trouble.)
Not sure of the history of ShenandoahLock or why the parallel infra to MutexLocker was introduced (perhaps for allowing some performance/tunability), but might be worthwhile to see if we want to build lock rank checks in for robustness/maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm coming back to this PR after working on others. Thanks for your comments.
This is a good catch. I know better than to do that! Sorry.
My intention was to rank-order the locks. Whenever multiple locks are held, it should be in this order:
first acquire the global heap lock
In nested context, acquire the rebuild_lock
Any thread that only acquires the global heap lock or only acquires the rebuild_lock will not deadlock.
Multiple threads that acquire both locks will not deadlock because they acquire in the same order.
The code you identified was definitely a problem because we were acquiring the two lock in the wrong order. I'm going to remove that assert and the lock associated with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this comment to clarify the refined intent.
|
@kdnilsen this pull request can not be integrated into git checkout synchronize-available-with-rebuild
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
| inline size_t available() const { return _partitions.available_in_not_locked(ShenandoahFreeSetPartitionId::Mutator); } | ||
| inline size_t available() { | ||
| shenandoah_assert_not_heaplocked(); | ||
| ShenandoahRebuildLocker locker(rebuild_lock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be motivate in a brief comment why we need the rebuild lock in this API, but not around the other APIs such as capacity() and used()?
ysramakrishna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Curious if any performance delta was noted in fresh measurements following this final shape of fix.


This code introduces a new rebuild-freeset lock for purposes of coordinating the freeset rebuild activities and queries as to memory available for allocation in the mutator partition.
This addresses a problem that results if available memory is probed while we are rebuilding the freeset.
Rather than using the existing global heap lock to synchronize these activities, a new more narrowly scoped lock is introduced. This allows the available memory to be probed even when other activities hold the global heap lock for reasons other than rebuilding the freeset, such as when they are allocating memory. It is known that the global heap lock is heavily contended for certain workloads, and using this new lock avoids adding to contention for the global heap lock.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27612/head:pull/27612$ git checkout pull/27612Update a local copy of the PR:
$ git checkout pull/27612$ git pull https://git.openjdk.org/jdk.git pull/27612/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27612View PR using the GUI difftool:
$ git pr show -t 27612Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27612.diff
Using Webrev
Link to Webrev Comment