RHX Upstream patches #188

luismgsilva · 2025-11-14T10:22:43Z

No description provided.

This commit introduces a new -mtune=rmx100 tuning option together with relevant scheduler definitions. Instruction latencies and costs are based on the "RMX-100 Technical Reference Manual" document (revision 0.4, 13 September 2023) and are subject to change. The changes have been verified by running the Dhrystone and Coremark benchmarks and observing expected (small) improvements compared to the -mtune=generic results. Signed-off-by: Artemiy Volkov <[email protected]>

This commit adds the new arcv-mpy-option compilation parameter with the valid (string) values of 1c, 2c, and 10c. This corresponds to different versions of the MPY/DIV unit of the RMX100 core, each of which has different latencies for imul/idiv instructions. Internally, this option is propagated to the pipeline description information in rmx100.md with the use of new helper functions defined in riscv.cc. Signed-off-by: Artemiy Volkov <[email protected]>

This patch adds latencies related to FPU instructions to arcv-rmx100.md. The specific values used correspond to the 'fast' config, except fdiv where the latency was reduced to 10 cycles. In the future, FP latencies for RMX-100 should be made dependent on an external (-mfpu-like) option. Signed-off-by: Artemiy Volkov <[email protected]>

Signed-off-by: Claudiu Zissulescu <[email protected]>

For the RMX-500 and RHX cores, the sequence "load-immediate + store" (that is used to store a constant value) can be executed in 1 cycle, provided the two instructions are kept next to one another. This patch handles this case in riscv_macro_fusion_pair_p(). Signed-off-by: Artemiy Volkov <[email protected]>

ARC-V related optimisations must be guarded like: if (riscv_microarchitecture == <arch>) { ... } Introduce an inline function that encapsulates this: static inline bool riscv_is_micro_arch (<arch>) Use it to define __riscv_rhx whenever compiling for the RHX microarchitecture. Signed-off-by: Shahab Vahedi <[email protected]>

With this commit, we allow a load-immediate to be macro-op fused with a successive conditional branch that is dependent on it, e.g.: li t0, #imm bge t1, t0, .label Additionally, add a new testcase to check that this fusion type is handled correctly. Signed-off-by: Artemiy Volkov <[email protected]>

To take better advantage of double load/store fusion, make use of the sched_fusion pass that assigns unique "fusion priorities" to load/store instructions and schedules operations on adjacent addresses together. This maximizes the probability that loads/stores are fused between each other instead of with other instructions. Signed-off-by: Artemiy Volkov <[email protected]>

With this patch, arcv_macro_fusion_pair_p () recognizes instruction pairs like: LOAD rd1, [rs1,offset] add/sub rd2, rs1, rs2/imm (where all regs are distinct) and: STORE rs2, [rs1,offset] add/sub rd, rs1, rs2/imm as fused macro-op pairs. In the case of a load, rd1 being equal to rd2, rs1, or rs2 would lead to data hazards, hence this is disallowed; for stores, rs1 and rs2 of the two instructions must match. Signed-off-by: Artemiy Volkov <[email protected]>

Fuse together instruction pairs such as: LOAD rd1, [rs1,offset] lui rd2, imm (where rd1 and rd2 are distinct) and: STORE rs2, [rs1,offset] lui rd, imm Signed-off-by: Artemiy Volkov <[email protected]>

The RHX core executes integer multiply-add sequences of the form: mul r1,r2,r3 add r1,r1,r4 in 1 cycle due to macro-op fusion. This patch adds a define_insn_and_split to recognize the above sequence and preserve it as a single insn up until the post-reload split pass. Since, due to a microarchitectural restriction, the output operand of both instructions must be the same register, the insn_and_split pattern has two alternatives corresponding to the following cases: (0) r1 is different from r4, in which case the insn can be split to the sequence above; (1) r1 and r4 are the same, in which case a temporary register has to be used and there is no fusion. Alternative (1) is discouraged so that reload maximizes the number of instances where MAC fusion can be applied. Since RHX is a rv32im core, the pattern requires that the target is 32-bit and supports multiplication. In addition, the {u,}maddhisi3 expand is implemented for RHX to convert the ( 16-bit x 16-bit + 32_bit ) WIDEN_MULT_PLUS_EXPR GIMPLE operator to the aforementioned madd_split instruction directly. Lastly, a very basic testcase is introduced to make sure that the new patterns are sufficient to produce MAC-fusion-aware code. No new dejagnu failures with RUNTESTFLAGS="CFLAGS_FOR_TARGET=-mtune=rhx dg.exp". Signed-off-by: Artemiy Volkov <[email protected]>

To make sure that the multiply-add pairs (split post-reload from the madd_split instruction) are not broken up by the sched2 pass, designate them as fusable in arcv_macro_fusion_pair_p (). Signed-off-by: Artemiy Volkov <[email protected]>

The bitfield zero_extract operation is normally expanded into an srai followed by an andi. (With the ZBS extension enabled, the special case of 1-bit zero-extract is implemented with the bexti insn.) However, since the RHX core can execute a shift-left and a shift-right of the same register in 1 cycle, we would prefer to emit those two instructions instead, and schedule them together so that macro fusion can take place. The required steps to achieve this are: (1) Create an insn_and_split that handles the zero_extract RTX; (2) Tell the combiner to use that split by lowering the cost of the zero_extract RTX when the target is the RHX core; (3) Designate the resulting slli + srli pair as fusable by the scheduler. Attached is a small testcase demonstrating the split, and that the bexti insn still takes priority over the shift pair. Signed-off-by: Artemiy Volkov <[email protected]>

Some fusion types (namely, LD/ST-OP/OPIMM and LD/ST-LUI) are available regardless of the order of instructions. To support this, extract the new arcv_memop_arith_pair_p () and arcv_memop_lui_pair_p () functions and call them twice. Signed-off-by: Artemiy Volkov <[email protected]>

This commit implements the scheduling model for the RHX-100 core. Among notable things are: (1) The arcv_macro_fusion_pair_p () hook has been modified to not create SCHED_GROUP's larger than 2 instructions; also, it gives priority to double load/store fusion, suppressing the other types until sched2. (2) riscv_issue_rate () is set to 4 and the system is modeled as 4 separate pipelines, giving access to as many instructions in ready_list as possible. (3) The rhx.md description puts some initial constraints in place (e.g. memory ops can only go into pipe B), saving some work in the reordering hook. (4) The riscv_sched_variable_issue () and riscv_sched_reorder2 () hooks work together to make sure (in order of descending priority) that: (a) the critical path and the instruction priorities are respected; (b) both pipes are filled (taking advantage of parallel dispatch within the microarchitectural constraints); (c) there is as much fusion going on as possible (and the existing fusion pairs are not broken up). There is probably some room for improvement, and some tweaks will probably have to be made in response to HLA changes as the HW development process goes on. Signed-off-by: Artemiy Volkov <[email protected]>

This patch implements riscv_sched_adjust_priority () for the RHX-100 microarchitecture by slightly bumping the priority of load/store pairs. As a consequence of this change, it becomes easier for riscv_sched_reorder2 () to schedule instructions in the memory pipe. Signed-off-by: Artemiy Volkov <[email protected]>

In addition to the LW+LW and SW+SW pairs that are already being recognized as macro-op-fusable, add support for 8-bit and naturally aligned 16-bit loads operating on adjacent memory locations. To that end, introduce the new microarch-specific pair_fusion_mode_allowed_p () predicate, and call it from fusion_load_store () during sched_fusion, and from arcv_macro_fusion_pair_p () during regular scheduling passes. Signed-off-by: Artemiy Volkov <[email protected]>

Currently on ARC-V, the maddhisi3 pattern always expands to the madd_split_fused instruction regardless of the target word size, which leads to the full-width mul and add instructions being emitted for 32-bit data even on riscv64: mul a6,a4,s6 add a6,a6,s7 sext.w s7,a6 To fix this, add another define_insn (madd_split_fused_extended) pattern wrapping the result of a MAC operation into a sign-extension from 32 to 64 bits, and use it in the (u)maddhisi3 expander in case of a 64-bit target. The assembly code after this change is more efficient, viz.: mulw a6,a4,s6 addw a6,a6,s7 Signed-off-by: Artemiy Volkov <[email protected]>

This define_insn_and_split prevents *zero_extract_fused from being selected. Updated the test. It succeeded despite the fused case not being selected because the right instructions were produced still. Signed-off-by: Michiel Derhaeg <[email protected]>

Signed-off-by: Luis Silva <[email protected]>

Align with how other RISC-V arch are checked (e.g., TARGET_SIFIVE_7, TARGET_ROCKET, TARGET_SIFIVE_P400_SERIES). Signed-off-by: Luis Silva <[email protected]>

In these cases, it makes more sense to check if ARCV fusion is enabled rather than checking if a specific mtune is enabled. Signed-off-by: Luis Silva <[email protected]>

Signed-off-by: Luis Silva <[email protected]>

Group alu_pipe_scheduled_p, pipeB_scheduled_p, last_scheduled_insn, and cached_can_issue_more into arcv_sched_state struct. Signed-off-by: Luis Silva <[email protected]>

Signed-off-by: Luis Silva <[email protected]>

gcc/config/riscv/arcv-rhx100.md

gcc/config/riscv/riscv-c.cc

gcc/config/riscv/arcv-rhx100.md

MichielDerhaeg · 2025-11-18T09:55:11Z

gcc/config/riscv/arcv.cc

+/* ARCV-specific macro-op fusion for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.


Would describe this more general. "ARC-V-specific functions something something"

I've updated the headers of arcv.{cc,h} to match the style used in other architecture-specific files. What do you think?

gcc/config/riscv/arcv.cc

MichielDerhaeg · 2025-11-18T10:07:17Z

gcc/config/riscv/arcv.cc

+      if (REG_P (XEXP (curr_plus, 1))
+	 && REGNO (XEXP (curr_plus, 1)) == mult_dest_regno)
+       return true;
+    }


The caller adds debug logs to each of these, I think we should also do that.

I've added the respective debug logs.

gcc/config/riscv/riscv.cc

Signed-off-by: Luis Silva <[email protected]>

MichielDerhaeg · 2025-11-18T14:32:39Z

gcc/config/riscv/arcv.cc

+    else if (get_attr_type (ready[*n_readyp - 1]) != TYPE_LOAD
+	|| get_attr_type (ready[*n_readyp - 1]) != TYPE_STORE)


Don't think this condition is correct, it is always true.

Im wondering if this is what is supposed to be:

From 4f60a816bd1f2dd28e471b473460321e48e2a99c Mon Sep 17 00:00:00 2001 From: Luis Silva <[email protected]> Date: Tue, 18 Nov 2025 15:12:07 +0000 Subject: [PATCH] tmp: arcv: Fix incorrect condition and logic in reorder scheduler. Signed-off-by: Luis Silva <[email protected]> --- gcc/config/riscv/arcv.cc | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/gcc/config/riscv/arcv.cc b/gcc/config/riscv/arcv.cc index 8354d838c72..f0a39a23cb7 100644 --- a/gcc/config/riscv/arcv.cc +++ b/gcc/config/riscv/arcv.cc @@ -670,22 +670,23 @@ arcv_sched_reorder2 (rtx_insn **ready, int *n_readyp) /* If all else fails, schedule a single instruction. */ if (ready && *n_readyp > 0 && NONDEBUG_INSN_P (ready[*n_readyp - 1]) - && recog_memoized (ready[*n_readyp - 1]) >= 0 - && get_attr_type (ready[*n_readyp - 1]) != TYPE_LOAD - && get_attr_type (ready[*n_readyp - 1]) != TYPE_STORE) + && recog_memoized (ready[*n_readyp - 1]) >= 0) { + rtx_insn *insn = ready[*n_readyp - 1]; + enum attr_type insn_type = get_attr_type (insn); + + /* Memory operations go to pipeB if available. */ if (!sched_state.pipeB_scheduled_p - && (get_attr_type (ready[*n_readyp - 1]) == TYPE_LOAD - || get_attr_type (ready[*n_readyp - 1]) == TYPE_STORE)) + && (insn_type == TYPE_LOAD || insn_type == TYPE_STORE)) { - sched_state.alu_pipe_scheduled_p = sched_state.pipeB_scheduled_p = 1; + sched_state.pipeB_scheduled_p = 1; sched_state.cached_can_issue_more = 1; return 1; } - else if (get_attr_type (ready[*n_readyp - 1]) != TYPE_LOAD - || get_attr_type (ready[*n_readyp - 1]) != TYPE_STORE) + /* Non-memory operations go to ALU pipe. */ + else if (insn_type != TYPE_LOAD && insn_type != TYPE_STORE) { - sched_state.alu_pipe_scheduled_p = sched_state.pipeB_scheduled_p = 1; + sched_state.alu_pipe_scheduled_p = 1; sched_state.cached_can_issue_more = 1; return 1; } -- 2.39.5

I think this makes sense. Can we evaluate the impact of this change on performance downstream & upstream?

I got the following between unpatched and patched for downstream and upstream:

Downstream
Unpatched vs patched = +-0.15% faster

Upstream
Unpatched vs Patched = +-0.27% faster

Signed-off-by: Luis Silva <[email protected]>

MichielDerhaeg

Looks good. I think we can start splitting this up in a patch series for review.

artemiy-volkov and others added 26 commits November 4, 2025 13:37

arcv: Add initial scheduling scheme.

5b1ca41

Signed-off-by: Claudiu Zissulescu <[email protected]>

arcv: fuse load/store with lui

55bba89

Fuse together instruction pairs such as: LOAD rd1, [rs1,offset] lui rd2, imm (where rd1 and rd2 are distinct) and: STORE rs2, [rs1,offset] lui rd, imm Signed-off-by: Artemiy Volkov <[email protected]>

fixup! arcv: Add initial scheduling scheme.

0912ccf

tmp: fusion: add arcv.{cc,h} files.

1698738

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Repalce riscv_is_micro_arch with TARGET_ARCV_RHX100 macro.

53510e5

Align with how other RISC-V arch are checked (e.g., TARGET_SIFIVE_7, TARGET_ROCKET, TARGET_SIFIVE_P400_SERIES). Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Use riscv_fusion_enabled_p (RISCV_FUSE_ARCV)

1af0e74

In these cases, it makes more sense to check if ARCV fusion is enabled rather than checking if a specific mtune is enabled. Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Clean up arcv_memop_arith_pair_p ()

d8ade80

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Clean up arcv_memop_lui_pair_p ()

375ea0c

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Clean up arcv_macro_fusion_pair_p ()

a0d8832

Signed-off-by: Luis Silva <[email protected]>

luismgsilva requested review from Alexehv77 and MichielDerhaeg November 14, 2025 10:22

luismgsilva self-assigned this Nov 14, 2025

tmp: arcv: Refactor scheduler state variables into struct.

dd9f95b

Group alu_pipe_scheduled_p, pipeB_scheduled_p, last_scheduled_insn, and cached_can_issue_more into arcv_sched_state struct. Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Fix 80 character limit lines

66ff6b8

Signed-off-by: Luis Silva <[email protected]>

luismgsilva marked this pull request as ready for review November 14, 2025 15:01

luismgsilva added 8 commits November 14, 2025 15:36

tmp: arcv: Refactor helper function names

fdf6d84

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Remove newlines.

1edd45f

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Add hook comments

e99ae0d

Signed-off-by: Luis Silva <[email protected]>

fixup! tmp: arcv: Refactor scheduler state variables into struct.

4ea42df

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: Refactor helper function names 2

3bad3ca

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: tmp: arcv: Clean up arcv_sched_fusion_priority ()

367327e

Signed-off-by: Luis Silva <[email protected]>

Fix comment

aee9fd0

Signed-off-by: Luis Silva <[email protected]>

tmp: Remove newline.

4b1cee4

Signed-off-by: Luis Silva <[email protected]>

MichielDerhaeg reviewed Nov 18, 2025

View reviewed changes

luismgsilva added 10 commits November 18, 2025 10:37

tmp: arcv: Revert riscv_macro_fusion_p to static.

bca8fce

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv: arcv-rhx100.md: Remove commented code.

46a8a4e

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv-rhx100.md: Update copyright year..

222c326

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv.{cc,h}: Update header.

55cf42f

Signed-off-by: Luis Silva <[email protected]>

tmp: riscv-c.cc: Remove __riscv_rhx

f77ad5e

Signed-off-by: Luis Silva <[email protected]>

fixup! tmp: fusion: add arcv.{cc,h} files.

e242bd8

Signed-off-by: Luis Silva <[email protected]>

fixup! tmp: fusion: add arcv.{cc,h} files.

b4b8c83

Signed-off-by: Luis Silva <[email protected]>

fixup! tmp: fusion: add arcv.{cc,h} files.

f6a18d3

Signed-off-by: Luis Silva <[email protected]>

tmp: arcv.{cc,h}/riscv.cc: Remove unused variable.

eb4055d

Signed-off-by: Luis Silva <[email protected]>

fixup! arcv: add scheduling information for the Synopsys RMX-100 CPU

eb71e74

Signed-off-by: Luis Silva <[email protected]>

MichielDerhaeg reviewed Nov 18, 2025

View reviewed changes

tmp: arcv: Fix incorrect condition and logic in reorder scheduler.

4f60a81

Signed-off-by: Luis Silva <[email protected]>

MichielDerhaeg approved these changes Nov 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RHX Upstream patches #188

RHX Upstream patches #188

Uh oh!

luismgsilva commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichielDerhaeg Nov 18, 2025

Uh oh!

luismgsilva Nov 18, 2025

Uh oh!

Uh oh!

MichielDerhaeg Nov 18, 2025

Uh oh!

luismgsilva Nov 18, 2025

Uh oh!

Uh oh!

MichielDerhaeg Nov 18, 2025

Uh oh!

luismgsilva Nov 18, 2025 •

edited

Loading

Uh oh!

MichielDerhaeg Nov 19, 2025

Uh oh!

luismgsilva Nov 19, 2025

Uh oh!

MichielDerhaeg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		/* ARCV-specific macro-op fusion for RISC-V.
		Copyright (C) 2025 Free Software Foundation, Inc.

		else if (get_attr_type (ready[*n_readyp - 1]) != TYPE_LOAD
		\|\| get_attr_type (ready[*n_readyp - 1]) != TYPE_STORE)

RHX Upstream patches #188

Are you sure you want to change the base?

RHX Upstream patches #188

Uh oh!

Conversation

luismgsilva commented Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichielDerhaeg Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

luismgsilva Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichielDerhaeg Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

luismgsilva Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichielDerhaeg Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

luismgsilva Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichielDerhaeg Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

luismgsilva Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

MichielDerhaeg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

luismgsilva Nov 18, 2025 •

edited

Loading