Upstream ARC-V RHX-100 #176

MichielDerhaeg · 2025-10-15T07:56:26Z

Thanks for taking the time to contribute to GCC! Please be advised that if you are
viewing this on github.com, that the mirror there is unofficial and unmonitored.
The GCC community does not use github.com for their contributions. Instead, we use
a mailing list ([email protected]) for code submissions, code reviews, and
bug reports. Please send patches there instead.

Signed-off-by: Claudiu Zissulescu <[email protected]>

For the RMX-500 and RHX cores, the sequence "load-immediate + store" (that is used to store a constant value) can be executed in 1 cycle, provided the two instructions are kept next to one another. This patch handles this case in riscv_macro_fusion_pair_p(). Signed-off-by: Artemiy Volkov <[email protected]>

ARC-V related optimisations must be guarded like: if (riscv_microarchitecture == <arch>) { ... } Introduce an inline function that encapsulates this: static inline bool riscv_is_micro_arch (<arch>) Use it to define __riscv_rhx whenever compiling for the RHX microarchitecture. Signed-off-by: Shahab Vahedi <[email protected]>

With this commit, we allow a load-immediate to be macro-op fused with a successive conditional branch that is dependent on it, e.g.: li t0, #imm bge t1, t0, .label Additionally, add a new testcase to check that this fusion type is handled correctly. Signed-off-by: Artemiy Volkov <[email protected]>

To take better advantage of double load/store fusion, make use of the sched_fusion pass that assigns unique "fusion priorities" to load/store instructions and schedules operations on adjacent addresses together. This maximizes the probability that loads/stores are fused between each other instead of with other instructions. Signed-off-by: Artemiy Volkov <[email protected]>

With this patch, arcv_macro_fusion_pair_p () recognizes instruction pairs like: LOAD rd1, [rs1,offset] add/sub rd2, rs1, rs2/imm (where all regs are distinct) and: STORE rs2, [rs1,offset] add/sub rd, rs1, rs2/imm as fused macro-op pairs. In the case of a load, rd1 being equal to rd2, rs1, or rs2 would lead to data hazards, hence this is disallowed; for stores, rs1 and rs2 of the two instructions must match. Signed-off-by: Artemiy Volkov <[email protected]>

Fuse together instruction pairs such as: LOAD rd1, [rs1,offset] lui rd2, imm (where rd1 and rd2 are distinct) and: STORE rs2, [rs1,offset] lui rd, imm Signed-off-by: Artemiy Volkov <[email protected]>

The RHX core executes integer multiply-add sequences of the form: mul r1,r2,r3 add r1,r1,r4 in 1 cycle due to macro-op fusion. This patch adds a define_insn_and_split to recognize the above sequence and preserve it as a single insn up until the post-reload split pass. Since, due to a microarchitectural restriction, the output operand of both instructions must be the same register, the insn_and_split pattern has two alternatives corresponding to the following cases: (0) r1 is different from r4, in which case the insn can be split to the sequence above; (1) r1 and r4 are the same, in which case a temporary register has to be used and there is no fusion. Alternative (1) is discouraged so that reload maximizes the number of instances where MAC fusion can be applied. Since RHX is a rv32im core, the pattern requires that the target is 32-bit and supports multiplication. In addition, the {u,}maddhisi3 expand is implemented for RHX to convert the ( 16-bit x 16-bit + 32_bit ) WIDEN_MULT_PLUS_EXPR GIMPLE operator to the aforementioned madd_split instruction directly. Lastly, a very basic testcase is introduced to make sure that the new patterns are sufficient to produce MAC-fusion-aware code. No new dejagnu failures with RUNTESTFLAGS="CFLAGS_FOR_TARGET=-mtune=rhx dg.exp". Signed-off-by: Artemiy Volkov <[email protected]>

To make sure that the multiply-add pairs (split post-reload from the madd_split instruction) are not broken up by the sched2 pass, designate them as fusable in arcv_macro_fusion_pair_p (). Signed-off-by: Artemiy Volkov <[email protected]>

The bitfield zero_extract operation is normally expanded into an srai followed by an andi. (With the ZBS extension enabled, the special case of 1-bit zero-extract is implemented with the bexti insn.) However, since the RHX core can execute a shift-left and a shift-right of the same register in 1 cycle, we would prefer to emit those two instructions instead, and schedule them together so that macro fusion can take place. The required steps to achieve this are: (1) Create an insn_and_split that handles the zero_extract RTX; (2) Tell the combiner to use that split by lowering the cost of the zero_extract RTX when the target is the RHX core; (3) Designate the resulting slli + srli pair as fusable by the scheduler. Attached is a small testcase demonstrating the split, and that the bexti insn still takes priority over the shift pair. Signed-off-by: Artemiy Volkov <[email protected]>

Some fusion types (namely, LD/ST-OP/OPIMM and LD/ST-LUI) are available regardless of the order of instructions. To support this, extract the new arcv_memop_arith_pair_p () and arcv_memop_lui_pair_p () functions and call them twice. Signed-off-by: Artemiy Volkov <[email protected]>

This commit implements the scheduling model for the RHX-100 core. Among notable things are: (1) The arcv_macro_fusion_pair_p () hook has been modified to not create SCHED_GROUP's larger than 2 instructions; also, it gives priority to double load/store fusion, suppressing the other types until sched2. (2) riscv_issue_rate () is set to 4 and the system is modeled as 4 separate pipelines, giving access to as many instructions in ready_list as possible. (3) The rhx.md description puts some initial constraints in place (e.g. memory ops can only go into pipe B), saving some work in the reordering hook. (4) The riscv_sched_variable_issue () and riscv_sched_reorder2 () hooks work together to make sure (in order of descending priority) that: (a) the critical path and the instruction priorities are respected; (b) both pipes are filled (taking advantage of parallel dispatch within the microarchitectural constraints); (c) there is as much fusion going on as possible (and the existing fusion pairs are not broken up). There is probably some room for improvement, and some tweaks will probably have to be made in response to HLA changes as the HW development process goes on. Signed-off-by: Artemiy Volkov <[email protected]>

This patch implements riscv_sched_adjust_priority () for the RHX-100 microarchitecture by slightly bumping the priority of load/store pairs. As a consequence of this change, it becomes easier for riscv_sched_reorder2 () to schedule instructions in the memory pipe. Signed-off-by: Artemiy Volkov <[email protected]>

In addition to the LW+LW and SW+SW pairs that are already being recognized as macro-op-fusable, add support for 8-bit and naturally aligned 16-bit loads operating on adjacent memory locations. To that end, introduce the new microarch-specific pair_fusion_mode_allowed_p () predicate, and call it from fusion_load_store () during sched_fusion, and from arcv_macro_fusion_pair_p () during regular scheduling passes. Signed-off-by: Artemiy Volkov <[email protected]>

Currently on ARC-V, the maddhisi3 pattern always expands to the madd_split_fused instruction regardless of the target word size, which leads to the full-width mul and add instructions being emitted for 32-bit data even on riscv64: mul a6,a4,s6 add a6,a6,s7 sext.w s7,a6 To fix this, add another define_insn (madd_split_fused_extended) pattern wrapping the result of a MAC operation into a sign-extension from 32 to 64 bits, and use it in the (u)maddhisi3 expander in case of a 64-bit target. The assembly code after this change is more efficient, viz.: mulw a6,a4,s6 addw a6,a6,s7 Signed-off-by: Artemiy Volkov <[email protected]>

This define_insn_and_split prevents *zero_extract_fused from being selected. Updated the test. It succeeded despite the fused case not being selected because the right instructions were produced still. Signed-off-by: Michiel Derhaeg <[email protected]>

MichielDerhaeg self-assigned this Oct 15, 2025

MichielDerhaeg force-pushed the michiel/upstream_rmx100 branch from 8b7268c to c8067d6 Compare November 6, 2025 13:56

claziss and others added 17 commits November 6, 2025 14:56

arcv: Add initial scheduling scheme.

3137a82

Signed-off-by: Claudiu Zissulescu <[email protected]>

arcv: fuse load/store with lui

38016d4

Fuse together instruction pairs such as: LOAD rd1, [rs1,offset] lui rd2, imm (where rd1 and rd2 are distinct) and: STORE rs2, [rs1,offset] lui rd, imm Signed-off-by: Artemiy Volkov <[email protected]>

fixup! arcv: Add initial scheduling scheme.

6762c99

MichielDerhaeg force-pushed the michiel/upstream_rhx100 branch from 14db41e to 6762c99 Compare November 6, 2025 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upstream ARC-V RHX-100 #176

Upstream ARC-V RHX-100 #176

Uh oh!

MichielDerhaeg commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Upstream ARC-V RHX-100 #176

Are you sure you want to change the base?

Upstream ARC-V RHX-100 #176

Uh oh!

Conversation

MichielDerhaeg commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants