Skip to content

Conversation

savoptik
Copy link
Contributor

@savoptik savoptik commented Aug 15, 2025

The defect has been detected and confirmed in the function ArchDesc::declareClasses() located in the file src/hotspot/share/adlc/output_h.cpp with static code analysis. This defect can potentially lead to a null pointer dereference.

The pointer instr->_matrule is dereferenced in line 1952 without checking for nullptr, although earlier in line 1858 the same pointer is checked for nullptr, which indicates that it can be null.

According to this comment, this PR contains fixes for similar cases in other places.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26798/head:pull/26798
$ git checkout pull/26798

Update a local copy of the PR:
$ git checkout pull/26798
$ git pull https://git.openjdk.org/jdk.git pull/26798/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26798

View PR using the GUI difftool:
$ git pr show -t 26798

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26798.diff

Using Webrev

Link to Webrev Comment

…pp ArchDesc::declareClasses()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
…cpp LibraryCallKit::inline_vector_gather_scatter()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
…jfrThreadSampler.cpp JfrThreadSampler::set_period()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
…p MallocSiteTable::malloc_site()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
…nterval::split()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
…CICompileState::JVMCICompileState()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
…eezeThaw.cpp FreezeBase::finalize_freeze()

Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Artem Semenov <[email protected]>.
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 15, 2025

👋 Welcome back asemenov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 15, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8365604 Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() 8365604: Null pointer dereference in src/hotspot/share/adlc/output_h.cpp ArchDesc::declareClasses() Aug 15, 2025
@openjdk
Copy link

openjdk bot commented Aug 15, 2025

@savoptik The following labels will be automatically applied to this pull request:

  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal [email protected] hotspot [email protected] rfr Pull request is ready for review labels Aug 15, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 15, 2025

Webrevs

@dholmes-ora
Copy link
Member

/label add hotspot-jfr

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@dholmes-ora
The hotspot-jfr label was successfully added.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some alignment nits where you have added additional condition clauses.

Some of these are difficult to evaluate in isolation and will need review from the specific component areas.

Comment on lines -171 to +172
index++, head = (MallocSiteHashtableEntry*)head->next()) {}
assert(head != nullptr, "Invalid position index");
index++, head = ((MallocSiteHashtableEntry*)head->next() == nullptr) ? head :
(MallocSiteHashtableEntry*)head->next()) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me. We check head != nullptr in the loop condition so we cannot reach the assignment if it is null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A situation is possible where head becomes nullptr when head->next() returns nullptr on the last iteration. Then, after the loop finishes, assert(head != nullptr) will trigger (only in debug mode), and return head->data() will cause a program error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is it possible?

Perhaps you could explain how pos_idx is being used in this loop to guard against that happening and why that does not make this safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head->next() returns a pointer to _next without any checks.

In turn, the _next pointer is marked as volatile, which means it can be modified at any moment, for example, in another thread.

From this, I conclude that a check in this location is desirable. Moreover, pos_idx is also not being checked. It is quite possible that head->next() could turn out to be nullptr.

But I don’t mind. If you are sure that there can’t be a nullptr in this place, I will withdraw this patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, pos_idx is also not being checked

I don't know what you mean by this comment. pos_idx is being checked in the loop test before the call to head->next() in that same test.

The important question you need to address is why and what that check guarantees. I say you need to address it because you are the one claiming that there is a possible nullptr dereference here without any evidence that it has occurred in practice. If that is based on a correct analysis of the code then you need to explain how we can arrive at a situtation where we hit a null pointer that takes into account the logic of the loop test. So far you have not done so.

n.b. I am not claiming there is no possibility of a nullptr dereference here (although I can form my own opinion). I'm asking you to tell me why I should take your claim that it is possible seriously. Your answers so far are not convincing me that you have understood how this code works.

Copy link
Contributor Author

@savoptik savoptik Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos_idx receives its value when calling a certain function pos_idx_from_marker(marker), and there is no check before the loop to ensure that it is within the bounds of the _table size.

I mentioned above that I am not insisting on this particular patch. This issue was detected by a static analyzer. After that, based on my limited knowledge, I considered it confirmed… If you have any refutation, please share your thoughts. In that case, I will revert this patch and mark the trigger as “NO FIX REQUIRED”.

As far as I have checked, there are no checks anywhere in the calls to this function to compare the marker with the table or any other entities in any form.

I certainly do not claim to understand this code as well as you or any other member of the hotspot team.

Copy link
Contributor

@adinn adinn Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this leads right to the root of the problem I have with this report. As you say, pos_idx does indeed come out of a marker object. It took me about a minute to identify that this marker object is created in the function that sits right above the one your code assistant flagged as problematic -- even though I am not at all familiar with this code. It looks clear to me that, given the right call sequence for calls that create a marker and then consume it here, the check on pos_idx will ensure that we don't drop off the end of the list with a null pointer. So, it looks very liek this code has been designed so that the presence of a marker with a suitable pos_idx is intended to ensure this loop terminates before that happens. I am sure someone in this project knows whether that is the case but it is not you or your coding assistant.

I'm not suggesting that that calling sequence is actually right and that the check for pos_idx will definitely avoid dropping off the end. Indeed, I would welcome a bug report that proved it to be wrong. However, what is clear that both you and your coding assistant have failed to appreciate how some relatively obvious parts of this design actually operate. That renders your (or your tool's) analysis a shallow and unhelpful distraction; using it as an excuse to raise a purported 'issue' in the absence of any evidence of an actual issue is very much a waste of time for this project's reviewers.

Your error is compounded by the fact that you (or more likely your coding assistant) are suggesting changes which, because they are not founded in a correct understanding of the design, could potentially lead to worse outcomes than the speculative nullptr dereference they are intended to remedy -- as I explained when discussing your change to the control flow logic in the ALDC code. So, not only is this report unhelpful it is potentially harmful.

Ultimately the takeaway here is that the OpenJDK bug system is not here to report, review and add patches to remedy issues that you or your code assistant tool invents on the basis of misinformed assumptions. It is here to report, review and add patches to remedy issues that can be shown to actually affect the correct operation of the JVM and JDK,either by a reproducible test or by well-reasoned argument. So, please do not continue to spam the project with bug reports like this simply because a potentially bogus patch will improve your experience with what is clearly a decidedly fallible tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this leads right to the root of the problem I have with this report. As you say, pos_idx does indeed come out of a marker object. It took me about a minute to identify that this marker object is created in the function that sits right above the one your code assistant flagged as problematic -- even though I am not at all familiar with this code. It looks clear to me that, given the right call sequence for calls that create a marker and then consume it here, the check on pos_idx will ensure that we don't drop off the end of the list with a null pointer. So, it looks very liek this code has been designed so that the presence of a marker with a suitable pos_idx is intended to ensure this loop terminates before that happens. I am sure someone in this project knows whether that is the case but it is not you or your coding assistant.

I'm not suggesting that that calling sequence is actually right and that the check for pos_idx will definitely avoid dropping off the end. Indeed, I would welcome a bug report that proved it to be wrong. However, what is clear that both you and your coding assistant have failed to appreciate how some relatively obvious parts of this design actually operate. That renders your (or your tool's) analysis a shallow and unhelpful distraction; using it as an excuse to raise a purported 'issue' in the absence of any evidence of an actual issue is very much a waste of time for this project's reviewers.

Your error is compounded by the fact that you (or more likely your coding assistant) are suggesting changes which, because they are not founded in a correct understanding of the design, could potentially lead to worse outcomes than the speculative nullptr dereference they are intended to remedy -- as I explained when discussing your change to the control flow logic in the ALDC code. So, not only is this report unhelpful it is potentially harmful.

Ultimately the takeaway here is that the OpenJDK bug system is not here to report, review and add patches to remedy issues that you or your code assistant tool invents on the basis of misinformed assumptions. It is here to report, review and add patches to remedy issues that can be shown to actually affect the correct operation of the JVM and JDK,either by a reproducible test or by well-reasoned argument. So, please do not continue to spam the project with bug reports like this simply because a potentially bogus patch will improve your experience with what is clearly a decidedly fallible tool.

I’m sorry to have taken up your time.

Comment on lines -1316 to 1319
if (arr_type == nullptr || (arr_type != nullptr && !elem_consistent_with_arr(elem_bt, arr_type, false))) {
if (arr_type != nullptr && !elem_consistent_with_arr(elem_bt, arr_type, false)) {
log_if_needed(" ** not supported: arity=%d op=%s vlen=%d etype=%s atype=%s ismask=no",
is_scatter, is_scatter ? "scatter" : "gather",
num_elem, type2name(elem_bt), type2name(arr_type->elem()->array_element_basic_type()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug here but I'm not sure it is what you think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addr_type->isa_aryptr(); might return nullptr, while in elem_consistent_with_arr(elem_bt, arr_type, false), arr_type is only checked with an assert.
Moreover, the presence of a check in the original version indicates that arr_type can be null, and there is no protection against this.

@dholmes-ora
Copy link
Member

/label add hotspot-compiler
/label add hotspot-runtime

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@dholmes-ora
The hotspot-compiler label was successfully added.

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@dholmes-ora
The hotspot-runtime label was successfully added.

@dholmes-ora
Copy link
Member

I've added some additional mailing lists to ensure better coverage here.

Also I think you need to update the JBS (and PR) title to reflect the broader scope of the changes.

@adinn
Copy link
Contributor

adinn commented Aug 20, 2025

I'm not clear that the original issue is necessarily a bug that needs fixing with a skip to the next else if case.

The justification for adding instr->_matrule != nullptr && instr->_matrule->_rChild != nullptr to the if branch test is that earlier code allows for the possibility that instr->_matrule might be null. However, that check is performed in an unconditional context for any value of instr whereas this specific else branch limits the circumstance to the case where instr->is_ideal_copy() is found to be true.

So, the prior test offers no guarantee that in this restricted case a null pointer should or should not be possible. The original design may assume that a successful test for instr->is_ideal_copy() ought to guarantee that both instr->_matrule and instr->_matrule->_rChild are non-null. That cannot be determined by the evidence offered. It can only be determined by looking at how instr is constructed.

So, rather than just skip to the next case we might need to handle this with an assert and fix whatever code is producing an ideal copy with null fields.

Given the level of analysis offered for this case I am suspicious as to whether the other cases tacked onto this issue ought to be accepted at face value without some justification as to why a null check and skip to the next case is correct.

I'm also wondering how and why all these cases and associated amendments were arrived at? Is this perhaps based on output from a code analysis tool (perhaps even a genAI tool). If so then I think 1) we ought to be told and 2) we ought to treat its recommendations with a very healthy dose of skepticism.

@adinn
Copy link
Contributor

adinn commented Aug 21, 2025

n.b. Before accepting any of the changes in this PR I'd really like to know whether they have arisen from reports of an actual null pointer dereference or they are simply derived from some theoretical analysis. In the latter case then I think we would need a better explanation of why an error can happen than we have seen so far. Given that requirement I also think each of the changes should be submitted in its own PR with its own justification. We should not modify control flow logic on the nod.

@savoptik
Copy link
Contributor Author

I've added some additional mailing lists to ensure better coverage here.

Also I think you need to update the JBS (and PR) title to reflect the broader scope of the changes.

Please provide an example of an updated title…

@dean-long
Copy link
Member

Most of these mitigations to guard against a possible null pointer dereference are inside if expressions, which means if there was a null pointer, then we will now end up in the else clause, changing the behavior of the code to something that was perhaps unintended, and we still don't know what caused the null pointer. So this is just silently masking potential problems, and in my experience is usually not the correct fix. Most of the time the correct fix is to tell the static analyzer that it is a false positive and move on. Sometimes it is appropriate to add an assert or guarantee, and yes sometimes it is appropriate to do something different if there is a null, for example if it is a result of an allocation that can fail.

@savoptik savoptik closed this Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants