Skip to content

Conversation

@TheShermanTanker
Copy link
Contributor

@TheShermanTanker TheShermanTanker commented Nov 30, 2024

This is a general cleanup and improvement of LTO, as well as a quick fix to remove a workaround in the Makefiles that disabled LTO for g1ParScanThreadState.cpp due to the old poisoning mechanism causing trouble. The -Wno-attribute-warning change here can be removed once Kim's new poisoning solution is integrated.

  • -fno-omit-frame-pointer is added to gcc to stop the linker from emitting code without the frame pointer
  • -flto is set to $(JOBS) instead of auto to better match what the user requested
  • -Gy is passed to the Microsoft compiler. This does not fully fix LTO under Microsoft, but prevents warnings about -LTCG:INCREMENTAL at least

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8345265: Minor improvements for LTO across all compilers (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22464/head:pull/22464
$ git checkout pull/22464

Update a local copy of the PR:
$ git checkout pull/22464
$ git pull https://git.openjdk.org/jdk.git pull/22464/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22464

View PR using the GUI difftool:
$ git pr show -t 22464

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22464.diff

Using Webrev

Link to Webrev Comment

@TheShermanTanker TheShermanTanker marked this pull request as draft November 30, 2024 00:36
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 30, 2024

👋 Welcome back jwaters! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 30, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8345265 8345265: Fix gcc LTO without disabling LTO for g1ParScanThreadState.cpp Nov 30, 2024
@openjdk
Copy link

openjdk bot commented Nov 30, 2024

@TheShermanTanker The following labels will be automatically applied to this pull request:

  • build
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@TheShermanTanker TheShermanTanker marked this pull request as ready for review November 30, 2024 00:43
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 30, 2024
@mlbridge
Copy link

mlbridge bot commented Nov 30, 2024

Webrevs

@TheShermanTanker TheShermanTanker marked this pull request as draft November 30, 2024 03:09
@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 30, 2024
Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only noticed this had been changed back to Draft after I was mostly done looking
at it. But I don't think this should be done this way, esp. since it didn't seem to work
(as in suppressing warnings from LTO) for me.

// The memory allocated in libjvmci was not allocated with os::malloc
// so must not be freed with os::free.
ALLOW_C_FUNCTION(::free((void*) _init_error_msg));
ALLOW_C_FUNCTION(::free, ::free((void*) _init_error_msg);)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do this bug fix change under a separate JBS issue & PR. I've created a JBS issue for it:
https://bugs.openjdk.org/browse/JDK-8345267
Fix memory leak in JVMCIEnv dtor

// forbidden warnings in such builds.
#define FORBID_C_FUNCTION(signature, alternative) \
extern "C" __attribute__((__warning__(alternative))) signature;
[[gnu::warning(alternative)]] signature noexcept;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you making this change at all, let alone under this JBS issue?

Among other problems, noexcept is mostly irrelevant in HotSpot, since we build with
exceptions disabled. (There are a few places where noexcept affects semantics, like for
operator new, but otherwise there is no point.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the extern "C" and I think it might not be needed. A method that was already declared extern "C" in a C library header will keep that linkage when it is declared again, even without extern "C". There's also the issue that this forbidding macro could declare methods that don't actually exist on the current platform, which I think(?) removing extern "C" helps prevent. There's also the strange case that not all platforms have C library methods that are extern "C" (Windows is a notable example), so this helps declaring things with conflicting linkages and causing an error. The noexcept was just to match the declarations in standard library headers, since they are supposed to be noexcept according to the Standard

*/

// Stopgap fix until FORBID_C_FUNCTION can work properly with LTO
#define DISABLE_POISONING_STOPGAP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed if not using LTO.

*/

// Stopgap fix until FORBID_C_FUNCTION can work properly with LTO
#define DISABLE_POISONING_STOPGAP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far as I can tell, this doesn't work. I still get tons of -Wattribute-warnings when building
with LTO, because of similar problem from other files.

*/

// Stopgap fix until FORBID_C_FUNCTION can work properly with LTO
#define DISABLE_POISONING_STOPGAP

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents precompiled headers from being used for this file. -Winvalid-pch will warn if enabled.

@TheShermanTanker
Copy link
Contributor Author

I only noticed this had been changed back to Draft after I was mostly done looking at it. But I don't think this should be done this way, esp. since it didn't seem to work (as in suppressing warnings from LTO) for me.

Yeah, I had noticed that it didn't work too, which is why I returned it to draft. It also causes VC to explode when compiling the Windows HotSpot, so that isn't ideal. I guess returning g1ParScanThreadState.cpp to LTO status will have to wait until FORBID_C_FUNCTION is properly fixed up to be LTO proof

@TheShermanTanker
Copy link
Contributor Author

This needs a name and description change, I'll do so later. @MBaesken does this fix LTO on your end? Kim also reports that LTO hangs indefinitely alongside several warning messages, do you have similar issues when you try to enable LTO?

@MBaesken
Copy link
Member

MBaesken commented Dec 9, 2024

This needs a name and description change, I'll do so later. @MBaesken does this fix LTO on your end? Kim also reports that LTO hangs indefinitely alongside several warning messages, do you have similar issues when you try to enable LTO?

When I try to build with this change (with and without lto enabled) I run into

/openjdk/tools/devkits/x86_64-linux-gnu-to-x86_64-linux-gnu-fedora27-gcc11.3.0/x86_64-linux-gnu/sysroot/usr/include/unistd.h:520:14: error: conflicting declaration of 'char* get_current_dir_name()' with 'C' linkage
  520 | extern char *get_current_dir_name (void) __THROW;
      |              ^~~~~~~~~~~~~~~~~~~~

(maybe it is related to the devkit, maybe to pch I don't know)

@TheShermanTanker
Copy link
Contributor Author

This needs a name and description change, I'll do so later. @MBaesken does this fix LTO on your end? Kim also reports that LTO hangs indefinitely alongside several warning messages, do you have similar issues when you try to enable LTO?

When I try to build with this change (with and without lto enabled) I run into

/openjdk/tools/devkits/x86_64-linux-gnu-to-x86_64-linux-gnu-fedora27-gcc11.3.0/x86_64-linux-gnu/sysroot/usr/include/unistd.h:520:14: error: conflicting declaration of 'char* get_current_dir_name()' with 'C' linkage
  520 | extern char *get_current_dir_name (void) __THROW;
      |              ^~~~~~~~~~~~~~~~~~~~

(maybe it is related to the devkit, maybe to pch I don't know)

It's related to the subtle change in FORBID_C_FUNCTION, I think unistd.h is being included before compilerWarnings.hpp somewhere. Well, at least I now know the current approach has this issue

@TheShermanTanker TheShermanTanker marked this pull request as ready for review December 17, 2024 14:47
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 17, 2024
@TheShermanTanker
Copy link
Contributor Author

/touch

@openjdk
Copy link

openjdk bot commented May 29, 2025

@TheShermanTanker The pull request is being re-evaluated and the inactivity timeout has been reset.

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 26, 2025

@TheShermanTanker This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@TheShermanTanker
Copy link
Contributor Author

/touch

@openjdk
Copy link

openjdk bot commented Jun 27, 2025

@TheShermanTanker The pull request is being re-evaluated and the inactivity timeout has been reset.

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 25, 2025

@TheShermanTanker This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@TheShermanTanker
Copy link
Contributor Author

/touch

@openjdk
Copy link

openjdk bot commented Aug 4, 2025

@TheShermanTanker The pull request is being re-evaluated and the inactivity timeout has been reset.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 22, 2025

@TheShermanTanker This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@TheShermanTanker
Copy link
Contributor Author

/touch

@openjdk
Copy link

openjdk bot commented Aug 23, 2025

@TheShermanTanker The pull request is being re-evaluated and the inactivity timeout has been reset.

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 20, 2025

@TheShermanTanker This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@TheShermanTanker
Copy link
Contributor Author

/touch

@TheShermanTanker
Copy link
Contributor Author

Please give me more time, the flatten issue on my end is still causing problems

@openjdk
Copy link

openjdk bot commented Sep 20, 2025

@TheShermanTanker The pull request is being re-evaluated and the inactivity timeout has been reset.

@TheShermanTanker
Copy link
Contributor Author

After trying and failing and trying and failing countless hours and days... I admit I'm nowhere close to solving this problem. I just can't figure out what the slow paths and external methods not local to the g1ParScanThreadState.cpp source file are. Every attempt at using a tool to discern this has failed catastrophically, and the call hierarchy is enormous, meaning manually trying to do this myself is out of the question. @kimbarrett if I could ask, what are the slow paths that should not be inlined, and what are the methods that are not local to the source file in question? If I just knew what to look at, maybe I could begin to start tackling this problem properly.

@TheShermanTanker
Copy link
Contributor Author

Paging with the mailing list bridge to restart discussion, which I need in order to be able to continue working on this

@mlbridge
Copy link

mlbridge bot commented Oct 11, 2025

Mailing list message from Kim Barrett on hotspot-dev:

On 10/8/25 8:06 AM, Julian Waters wrote:

Paging with the mailing list bridge to restart discussion, which I need in order to be able to continue working on this

As I've said before, we (Oracle) think LTO support is a substantial project,
we haven't made it a priority, and nobody here has offered up time to work on
it. Someone else might have time for this; we don't seem to. That might change
in the future, but for now...

Some of the issues that make it a substantial project include:

The LTO vs flattening problem.

There are functions that assume (or even rely upon) implicit noinline via
being in a different translation unit from callers. I think that's probably a
much harder problem to resolve, since it's generally not obvious. Functions
that do things like getting the current stack pointer or frame pointer may be
relevant? Not sure what else.

Build time issues, and their impact on development time and testing resources.

Is LTO an alternative mode, different from how things are "normally" done?
That increases testing resource requirements, else it will bit rot.

There might be others that I'm not remembering or I'm completely unaware of.

@TheShermanTanker
Copy link
Contributor Author

The LTO vs flattening problem.

Currently that seems to be the biggest problem, I do have solutions for other smaller issues with LTO, but from what I witness the flattening problem has to be solved first before any more work can be done with LTO. While I fortunately do have time to spare to try working on the issue (I'm that "Someone else", more or less), I less fortunately don't have the knowledge required to fix that problem, as even though only G1 uses flattening, the call hierarchy is too massive to actually meaningfully discern which calls need to be flattened and which calls shouldn't be flattened. Not very sure if anyone knows which code paths shouldn't be inlined within g1ParScanThreadState.cpp, actually. If anyone knew, I could fast track the flattening issue on my end and have it solved pretty quickly.

@mrserb
Copy link
Member

mrserb commented Oct 17, 2025

@magicus In JBS, I see a long conversation about LTO optimization for libraries aiming to cover all use cases. Maybe it's better to start with something smaller? For example, provide a way to enable it per library and per platform, so it can be incrementally adopted. Initial results for some libraries in the java.desktop look promising.

@TheShermanTanker
Copy link
Contributor Author

@magicus In JBS, I see a long conversation about LTO optimization for libraries aiming to cover all use cases. Maybe it's better to start with something smaller? For example, provide a way to enable it per library and per platform, so it can be incrementally adopted. Initial results for some libraries in the java.desktop look promising.

Hi, at the moment this is HotSpot only; We're unfortunately facing a very severe issue in G1 that can't seem to be solved. I'm currently focusing on making it work for HotSpot before introducing this for the native libraries.

@mrserb
Copy link
Member

mrserb commented Oct 17, 2025

Hi, at the moment this is HotSpot only; We're unfortunately facing a very severe issue in G1 that can't seem to be solved. I'm currently focusing on making it work for HotSpot before introducing this for the native libraries.

But as far I understand it will be much easy to implement for libs, do you know any blockers?

@MBaesken
Copy link
Member

Hi, at the moment this is HotSpot only; We're unfortunately facing a very severe issue in G1 that can't seem to be solved. I'm currently focusing on making it work for HotSpot before introducing this for the native libraries.

But as far I understand it will be much easy to implement for libs, do you know any blockers?

Yes, the LTO build worked for the other JDK native libs when I enabled it there some months ago as a test.
(lto support for them is just a hack for now; but compiled nicely for me because those libs are much smaller/simpler than libjvm)

With GCC and LTO enabled, some libs get smaller e.g. for libfontmanager I saw a while ago 1.7M (without) to 1.1M (with LTO).
Speed improvements are hard to evaluate, the benchmarks used in OpenJDK often stress Hotspot and not so much single native JDK libs.

@MBaesken
Copy link
Member

MBaesken commented Oct 20, 2025

@magicus In JBS, I see a long conversation about LTO optimization for libraries aiming to cover all use cases. Maybe it's better to start with something smaller? For example, provide a way to enable it per library and per platform, so it can be incrementally adopted. Initial results for some libraries in the java.desktop look promising.

Currently we have OPTIMIZATION levels NONE, LOW, HIGH, HIGHEST, HIGHEST_JVM, SIZE for the native libs we build in OpenJDK. We could easily add also LTO or LTOHIGH + LTOSIZE if we want to distinguish even more.
But currently it is hard to evaluate what it is 'good' for.
Some people would expect improved performance from it; but we do not really have the benchmarks for the smaller JDK libs to prove that (at least I am not aware of it); that was also a problem when discussing to switch more libs to SIZE optimization.
Some people would expect improved/reduced size from using LTO; that is easier to 'prove' by looking at the libs sizes. But it is from what I saw not always true (for GCC with lto enabled however you often get smaller libs).

So should we still offer LTO for more libs as an option to enable for the lib, even with the mentioned issues?

@mrserb
Copy link
Member

mrserb commented Oct 20, 2025

So should we still offer LTO for more libs as an option to enable for the lib, even with the mentioned issues?

Provide an option for library owners to opt-in, which can be enabled per-library, per-platform and per-compiler after appropriate testing for performance, functionality, and footprint. So it will be possible to mix size/perf and lto optimizations. Then later we can decide what to use by default.

@kimbarrett
Copy link

FWIW, I'm prototyping a possible change in g1ParScanThreadState.cpp that might
substantially reduce the amount of generated code there. It might not work
out; I haven't done any performance testing yet, and it's really easy to
introduce performance regressions when making changes to that code. But if it
does work, that might help with the problems here.

@MBaesken
Copy link
Member

MBaesken commented Oct 22, 2025

So should we still offer LTO for more libs as an option to enable for the lib, even with the mentioned issues?

Provide an option for library owners to opt-in, which can be enabled per-library, per-platform and per-compiler after appropriate testing for performance, functionality, and footprint. So it will be possible to mix size/perf and lto optimizations. Then later we can decide what to use by default.

Yeah, maybe something like

LINK_TIME_OPTIMIZATION := YES

that can be set by lib owners and passed as an optional parameter to SetupJdkLibrary / SetupNativeCompilation or similar.
We just need the lto settings per toolchain in 'make/autoconf' (probably the c/cxx/ld flags can be borrowed from Hotspot where the functionality was available for some time as a configurable 'JVMFeature') , and in case LINK_TIME_OPTIMIZATION was set for a lib, we have to add them in a proper way.
(On the other hand, we need some LIB owners testing and hopefully using it, otherwise the new functionality just rots in the m4-files)

I created
https://bugs.openjdk.org/browse/JDK-8370438
8370438: Offer link time optimization support on library level

@MBaesken
Copy link
Member

So should we still offer LTO for more libs as an option to enable for the lib, even with the mentioned issues?

Provide an option for library owners to opt-in, which can be enabled per-library, per-platform and per-compiler after appropriate testing for performance, functionality, and footprint. So it will be possible to mix size/perf and lto optimizations. Then later we can decide what to use by default.

I created a draft PR #27976 to support enabling LTO on library level.
(and for testing enabled it for 2 libs)

@MBaesken
Copy link
Member

MBaesken commented Oct 24, 2025

For Linux x86_64 / gcc 13 here are the sizes of the libs and debuginfos before (without) and WITH lto enabled on LIB level

libfontmanager.debuginfo   54155720  =>  38059440
libfontmanager.so          1874776   =>  1194264
libfreetype.debuginfo      4143416   =>  3761080
libfreetype.so             742472    =>  778024

For the debug info one can see a good size effect ; but not always the lib size decreases, for libfreetype.so it increases slightly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

5 participants