Skip to content

Conversation

@stevenschlansker
Copy link

@stevenschlansker stevenschlansker commented Nov 19, 2025

The class docs already state that Interval is immutable, but this is not enforced.
Followup to #4901

I tried to fix up existing sites where this supposedly immutable class is mutated, to instead replace with a new value.

The class docs already state that Interval is immutable, but this is not enforced.
Followup to antlr#4901

Signed-off-by: Steven Schlansker <[email protected]>
}
// if in middle a..x..b, split interval
if ( el>a && el<b ) { // found in this interval
int oldb = I.b;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this variable oldb is redundant since we already saved I.b above at line 656. Let me know if you'd like me to remove this. (I didn't, in the spirit of keeping the change minimal)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only test coverage would tell us that. Let's not touch this.

}
// if in middle a..x..b, split interval
if ( el>a && el<b ) { // found in this interval
int oldb = I.b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only test coverage would tell us that. Let's not touch this.

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Nov 21, 2025

@parrt blessed.
I'm very supportive of this change. The class comment is as follows:
/** An immutable inclusive interval a..b */
Note the immutable in there, which expresses the original intent.
The fact that we violated it internally might have been justified by performance, memory, whatever... I suspect mutations of intervals in the cache is the root cause of issue #4901, more than the cache itself. There doesn't seem to be any code checking whether the interval being mutated is in the interval cache, and should therefore absolutely not be mutated. This change ensures that the behavior complies with the intent.
It probably also helps HotSpot do a better job since it can cache a and b.

@parrt
Copy link
Member

parrt commented Nov 24, 2025

I agree there is an inconsistency there, but in the back of my mind, I remember there being a reason. Perhaps @sharwell who has a young agile mind still can remember. In a nutshell, that is an insanely complicated algorithm and finally tuned implementation. We should be very careful about changing it unless there's a really good reason. :D

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Nov 24, 2025

The good reason is that changing fields of intervals deemed immutable has proven to break in a multithreaded scenario.
This PR affects in no way the algorithm. Let's see what @sharwell thinks.
In the meantime maybe @kaby76 can run performance tests to check whether it affects performance?

@KvanTTT
Copy link
Member

KvanTTT commented Nov 25, 2025

In the meantime maybe @kaby76 can run performance tests to check whether it affects performance?

In my opinion correctness and consistency matter more than performance.

@wendigo
Copy link

wendigo commented Nov 27, 2025

What's the decision here? Is there going to be a release once this change lands?

@ericvergnaud
Copy link
Contributor

What's the decision here? Is there going to be a release once this change lands?

Hey we're all very busy with our daily jobs, you might need to give this a bit more time :-)

@ericvergnaud
Copy link
Contributor

@kaby76 Hey Ken, as a mean to consolidate our confidence that this change is both useful and harmless, would you be able to test it against a significant set of grammars?

@kaby76
Copy link
Contributor

kaby76 commented Nov 28, 2025

Ok, will do. Let you know tomorrow or sooner if there's something unexpected. K

@parrt
Copy link
Member

parrt commented Nov 28, 2025

Thanks folks. If Ken can show that it works across all of his tests then let’s go for it

@sharwell
Copy link
Member

I'd want to see allocation benchmarks on big scenarios before moving forward here. Unfortunately, I'm not set up to run these anymore. Do we have any concrete examples of bugs in the wild caused by this?

@martint
Copy link

martint commented Nov 29, 2025

@sharwell, yes, this was the issue that led to it: jdbi/jdbi#2898

@ericvergnaud
Copy link
Contributor

I'd want to see allocation benchmarks on big scenarios before moving forward here. Unfortunately, I'm not set up to run these anymore. Do we have any concrete examples of bugs in the wild caused by this?

@sharwell If that helps, your C# implementation uses readonly fields.

@kaby76
Copy link
Contributor

kaby76 commented Dec 1, 2025

Sorry for taking a bit of time, but I wanted to be sure. The PR causes no regressions in any of the 370 grammars in grammars-v4. For most grammars, there was no statistical change in speed with the PR. However, there was for a few grammars, all slightly faster. For plsql, there was a speed up of 2% (sample size=20, two-sided t-test, t-statistic 8.97, df=38, p=6.3e-11). I made sure to make sure no bias in order of testing the PR versus control.

@parrt
Copy link
Member

parrt commented Dec 1, 2025

@sharwell that sounds like pretty good evidence. What do you think? I'm always nervous about these kind of changes, but it seems to fix a bug and not affect performance.

@parrt
Copy link
Member

parrt commented Dec 1, 2025

Thanks to @kaby76 for runnin all those tests!

@sharwell
Copy link
Member

sharwell commented Dec 1, 2025

I reviewed again and noticed that the only mutations of these fields occurred in IntervalSet.remove. This method does not appear to be used on hot paths.

@sharwell If that helps, your C# implementation uses readonly fields.

It doesn't hurt, but I have to keep in mind that the ANTLR 4 C# optimized target was constructed a bit differently from my old work on the ANTLR 3 C# target. I missed a few details during the initial conversion (e.g. Interval and some tokens could have been structs) and my dedication to avoiding breaking changes caused some of the items to stay indefinitely. I was more focused on algorithm improvements for ANTLR 4 and somewhat less on micro-optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants