Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Dec 11, 2025

Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET
Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions.
Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant.

Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e.

    X + X * 1 = 2X
    X + X * 2 = 3X
    X + X * 4 = 5X
    X + X * 8 = 9X

Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the
scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add.

      BASE   INDEX   SCALE  MULTIPLER
        X      X       1       2       (Terminal)
        X      X       2       3       (Terminal)
        X      X       4       5       (Terminal)
        X      X       8       9       (Terminal)
       3X     3X       1       6
        X     3X       2       7
       5X     5X       1      10
        X     5X       2      11
        X     3X       4      13
       5X     5X       2      15
        X     2X       8      17
       9X     9X       1      18
        X     9X       2      19
        X     5X       4      21
       5X     5X       4      25
       9X     9X       2      27
        X     9X       4      37
        X     5X       8      41
       9X     9X       4      45
        X     9X       8      73
       9X     9X       8      81

All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions.

Preliminary benchmarking shows around 15% improvments on latest x86 servers.

TODO:

  • Functional validation jtreg.
  • JMH micro benchmark.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8373480: Optimize multiplication by constant multiplier using LEA instructions (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759
$ git checkout pull/28759

Update a local copy of the PR:
$ git checkout pull/28759
$ git pull https://git.openjdk.org/jdk.git pull/28759/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28759

View PR using the GUI difftool:
$ git pr show -t 28759

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28759.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 11, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 11, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Dec 11, 2025

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@jatin-bhateja jatin-bhateja changed the title 8373480: Optimize constant input multiplication using LEA instructions 8373480: Optimize multiplication by constant multiplier using LEA instructions Dec 11, 2025
@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@openjdk
Copy link

openjdk bot commented Dec 11, 2025

@jatin-bhateja
The hotspot-compiler label was successfully added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants