Skip to content

Conversation

@MichielDerhaeg
Copy link
Contributor

No description provided.

Copy link
Contributor Author

@MichielDerhaeg MichielDerhaeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to split up the commits in something sensible. Didn't check whether they can be built individually though.

case SIGN_EXTRACT:
if (TARGET_XTHEADBB && outer_code == SET
if ((TARGET_ARCV_RHX100 || TARGET_XTHEADBB)
&& outer_code == SET
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this was added for the bit-extract fusion.

Comment on lines +4643 to +4672
(define_insn "*zero_extract_fused"
[(set (match_operand:SI 0 "register_operand" "=r")
(zero_extract:SI (match_operand:SI 1 "register_operand" "r")
(match_operand 2 "const_int_operand")
(match_operand 3 "const_int_operand")))]
"TARGET_ARCV_RHX100 && !TARGET_64BIT
&& (INTVAL (operands[2]) > 1 || !TARGET_ZBS)"
{
int amount = INTVAL (operands[2]);
int end = INTVAL (operands[3]) + amount;
operands[2] = GEN_INT (BITS_PER_WORD - end);
operands[3] = GEN_INT (BITS_PER_WORD - amount);
return "slli\t%0,%1,%2\n\tsrli\t%0,%0,%3";
}
[(set_attr "type" "alu_fused")]
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this fusion was never implemented as a define_insn_and_split. Might not be trivial to force these exact instructions after a split.

GCC Administrator and others added 29 commits December 6, 2025 00:16
libstdc++-v3/ChangeLog:

	* include/bits/atomic_wait.h (__detail::__atomic_eq): Use
	std::addressof instead of &.
	* include/std/atomic (atomic::wait, atomic::notify_one)
	(atomic::notify_all): Likewise.

Reviewed-by: Patrick Palka <[email protected]>
gcc/algol68/ChangeLog

	PR algol68/123007
	* a68-lang.cc (a68_type_for_size): Handle intTI_type_node.
Implement the forwarding performed by std::bind via deducing this when
available, instead of needing 4 operator() overloads.  Using deducing
this here is more complicated than in other standard call wrappers
because std::bind is not really "perfect forwarding": it doesn't
consider value category, and along with const-ness it also forwards
volatile-ness (until C++20).

The old implementation suffers from the same problem that other
pre-C++23 SFINAE-friendly call wrappers have which is solved by using
deducing this (see p5.5 of the deducing this paper P0847R7).

	PR libstdc++/80564

libstdc++-v3/ChangeLog:

	* include/std/functional (__cv_like): New.
	(_Bind::_Res_type): Don't define when not needed.
	(_Bind::__dependent): Likewise.
	(_Bind::_Res_type_cv): Likewise.
	(_Bind::operator()) [_GLIBCXX_EXPLICIT_THIS_PARAMETER]:
	Define as two instead of four overloads using deducing
	this.
	* testsuite/20_util/bind/cv_quals_2.cc: Ignore SFINAE
	diagnostics inside headers.
	* testsuite/20_util/bind/ref_neg.cc: Likewise.
	* testsuite/20_util/bind/80564.cc: New test.

Reviewed-by: Tomasz Kamiński <[email protected]>
Reviewed-by: Jonathan Wakely <[email protected]>
Starting with r16-4438-ga93f80feeef744, the edge sorting order was
switched to lowest execution frequency first.  But the "bbro"
optimization pass chooses the first edge as a fallthrough.  Thus the
most unlikely branches were optimized to fallthroughs.

Fix by restoring the sorting order prior to r16-4438-ga93f80feeef744.
Now the branches most likely to be executed are picked as fallthroughs.

There are no regressions for C and C++ on x86_64-pc-linux-gnu.

The new tests fail for the respective targets without this patch, and
pass with it.

	PR rtl-optimization/122675

gcc/ChangeLog:

	* bb-reorder.cc (edge_order): Fix BB edge ordering to be
	descending.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/pr122675-1.c: New test.
	* gcc.target/i386/pr122675-1.c: New test.
	* gcc.target/riscv/pr122675-1.c: New test.

Signed-off-by: Dimitar Dimitrov <[email protected]>
…fix option

From: Mark Zhuang <[email protected]>

The previous commit added --default-prefix to handle non-default git
prefix configurations, but this option is not available in older git
versions. This patch adds a compatibility check.

contrib/ChangeLog:

	* prepare-commit-msg: check --default-prefix
2025-12-06  Paul Thomas  <[email protected]>

gcc/fortran
	PR fortran/122578
	* primary.cc (gfc_match_varspec): Try to resolve a typebound
	generic procedure selector expression to provide the associate
	name with a type. Also, resolve component calls. In both cases,
	make a copy of the selector expression to guard against changes
	made by gfc_resolve_expr.

gcc/testsuite
	PR fortran/122578
	* gfortran.dg/pdt_72.f03: New test.
2025-12-06  Paul Thomas  <[email protected]>

gcc/fortran
	PR fortran/122669
	* resolve.cc (resolve_allocate_deallocate): Mold expressions
	with an array reference and a constant size must be resolved
	for each allocate object.

gcc/testsuite
	PR fortran/122669
	* gfortran.dg/pdt_73.f03: New test.
2025-12-06  Paul Thomas  <[email protected]>

gcc/fortran
	PR fortran/122670
	* decl.cc (gfc_get_pdt_instance): Ensure that, in an interface
	body, PDT instances imported implicitly if the template has
	been explicitly imported.
	* module.cc (read_module): If a PDT template appears in a use
	only statement, implicitly add the instances as well.

gcc/testsuite
	PR fortran/122670
	* gfortran.dg/pdt_74.f03: New test.
2025-12-06  Paul Thomas  <[email protected]>

gcc/fortran
	PR fortran/122693
	* array.cc (gfc_match_array_constructor): Stash and restore
	gfc_current_ns after the call to 'gfc_match_type_spec'.

gcc/testsuite
	PR fortran/122693
	* gfortran.dg/pdt_75.f03: New test.
This has been discussed in the 1/9 Reflection thread, but doesn't depend on
reglection in any way.
cp_parser_std_attribute calls lookup_attribute_spec as:
    const attribute_spec *as
      = lookup_attribute_spec (TREE_PURPOSE (attribute));
so with TREE_LIST where TREE_VALUE is attribute name and TREE_PURPOSE
attribute ns.  Similarly c_parser_std_attribute.  And for
attribute_takes_identifier_p those do:
    else if (attr_ns == gnu_identifier
             && attribute_takes_identifier_p (attr_id))
and
        bool takes_identifier
          = (ns != NULL_TREE
             && strcmp (IDENTIFIER_POINTER (ns), "gnu") == 0
             && attribute_takes_identifier_p (name));
when handling std attributes (for GNU attributes they just call those
with the IDENTIFIER_NODE name.
is_late_template_attribute and tsubst_attribute pass to these functions
just get_attribute_name though, so handle attributes in all namespaces
as GNU attributes only, which means that lookup_attribute_spec can
return NULL or find a different attribute if it is not from gnu:: or
say standard attribute mapped to gnu::, or attribute_takes_identifier_p
can return true even for attributes for which it shouldn't.

I thought about changing attribute_takes_identifier_p to take optionally
TREE_LIST, but that would mean handling it in the target hooks too and
they only care about GNU attributes right now, so given the above
parser.cc/c-parser.cc snippets, the following patch just follow
what they do.

2025-12-06  Jakub Jelinek  <[email protected]>

	* decl2.cc (is_late_template_attribute): Call lookup_attribute_spec
	on TREE_PURPOSE (attr) rather than name.  Only call
	attribute_takes_identifier_p if get_attribute_namespace (attr) is
	gnu_identifier.
	* pt.cc (tsubst_attribute): Only call attribute_takes_identifier_p
	if get_attribute_namespace (t) is gnu_identifier.
This is another thing discussed in the 1/9 Reflection thread,
also not dependent on reflection.

decl_attributes calls simple_cst_equal on TREE_VALUEs of the
current and preexisting attributes, but that is just a small
part of how attribute values should be compared.

The following patch fixes that.

2025-12-06  Jakub Jelinek  <[email protected]>

	* attribs.cc (decl_attributes): Use attribute_value_equal to
	compare attribute values instead of simple_cst_equal.
compile-std1.C was breaking on arm-eabi because these interfaces aren't
declared.  So for exporting let's check the same macros that control
declaring them.

libstdc++-v3/ChangeLog:

	* src/c++23/std.cc.in: Add more #if.
2025-12-06  Paul Thomas  <[email protected]>

gcc/testsuite
	PR fortran/103414
	* gfortran.dg/pdt_76.f03: New test.
Just a minor update to Dimitar's patch for the RISC-V testcase.

The cfi directives are not emitted for the -elf configurations causing the new
test to fail.  The cfi directives (and associated labels) don't seem relevant
to the test at hand, so this just drops them.

Pushing to the trunk.

	PR rtl-optimization/122675

gcc/testsuite
	* gcc.target/riscv/pr122675-1.c: Adjust expected output.
If the reducer is a function and the accumulator type isn't constrained,
at runtime the reduction will likely raise a Constraint_Error since the
reducer is repeatedly assigned to the accumulator variable (likely changing
its length). However, if the reducer is a procedure, no such assignment
occurs, and thus the runtime error only depends on the reducer logic.
This patch prevents the spurious warning in that case.

gcc/ada/
	* sem_attr.adb (Resolve_Attribute): Check if the reducer is a
	procedure before giving the warning.
When computing an address plus a large offset on riscv64 with a
PC-relative sequence, we may hit the range limit for auipc and get a
relocation overflow, where on riscv32 the computation wraps around.

Since -mcmodel=medany requires the entire program to fit in a 2GiB
address range, a +/-1GiB+ offset added to an in-range symbol in a
barely-fitting program is more likely than not to be out-of-range.
Since such large constants are unlikely to come up by chance, separate
them from the symbol so as to avoid the relocation overflow.


for  gcc/ChangeLog

	PR target/91420
	* config/riscv/riscv.cc (riscv_symbolic_constant_p): Require
	offsets smaller than +/- 1GiB for PCREL symbols.

for  gcc/testsuite/ChangeLog

	PR target/91420
	* gcc.target/riscv/pr91420.c: New.
Since we may delete stores that are found to be redundant in
postreload cse, we need cselib to invalidate argument stores at calls,
and to that end we need CALL_INSN_FUNCTION_USAGE to mention all MEM
stack space that may be legitimately modified by a const/pure callee,
i.e., all arguments passed to it on the stack.

When ACCUMULATE_OUTGOING_ARGS, each on-stack argument gets its own
usage information, but when it's not, each argument is pushed
incrementally, without precomputed stack slots.

Since we only mentioned such precomputed stack slots in
CALL_INSN_FUNCTION_USAGE, non-ACCUMULATE_OUTGOING_ARGS configurations
miss the stack usage data, and cselib fails to invalidate the stores.

Stores in such slots are anonymous, and they often invalidate other
anonymous slots, even part of the same object, but as the testcase
demonstrates, we may occasionally be unlucky that consecutive calls
have the stores to multi-word objects reordered by scheduling in such
a way that the last store for the first call survives the call in the
cselib tables, and then it is found to be redundant with the first
store for the subsequent call, as in the testcase.

So, if we haven't preallocated outgoing arguments for a call (which
would give us preassigned stack slots), and we have used any stack
space, add function call usage covering the entire stack range where
arguments were stored.


for  gcc/ChangeLog

	PR rtl-optimization/122947
	* calls.cc (expand_call): Add stack function usage in
	non-ACCUMULATE_OUTGOING_ARGS configurations.

for  gcc/testsuite/ChangeLog

	PR rtl-optimization/122947
	* gcc.dg/pr122947.c: New.
Rework dump_cselib_table to not crash when cselib_preserved_hash_table
is not allocated, and to remove the extraneous indirection from
dump_cselib_val that made it inconvenient to call from a debugger.


for  gcc/ChangeLog

	* cselib.cc (dump_cselib_val): Split out of and rename to...
	(dump_cselib_val_ptr): ... this.
	(dump_cselib_table): Adjust.  Skip cselib_preserved_hash_table
	when not allocated.
Volatile memory can be used as source operand for any operations.  Add
-ffuse-ops-with-volatile-access to fuse operations with volatile memory
reference and update simplify_binary_operation_1 to keep PLUS for 2
volatile memory references.  On x86, this optimizes

extern volatile int bar;

int
foo (int z)
{
  z *= 123;
  return bar + z;
}

into

foo:
	imull	$123, %edi, %eax
	addl	bar(%rip), %eax
	ret

and compile

extern volatile unsigned char u8;

void
test (void)
{
  u8 = u8 + u8;
  u8 = u8 - u8;
}

into

test:
	movzbl	u8(%rip), %eax
	addb	%al, u8(%rip)
	movzbl	u8(%rip), %eax
	subb	u8(%rip), %al
	movb	%al, u8(%rip)
	ret

Tested with Linux kernel 6.17.9 on Intel Core i7-1195G7.

gcc/

	PR target/122343
	* common.opt: Add -ffuse-ops-with-volatile-access.
	* common.opt.urls: Regenerated.
	* recog.cc (general_operand): Allow volatile memory reference if
	-ffuse-ops-with-volatile-access is enabled.
	* simplify-rtx.cc (simplify_binary_operation_1): Keep PLUS for 2
	volatile memory references.
	* doc/invoke.texi: Document -ffuse-ops-with-volatile-access.

gcc/testsuite/

	PR target/122343
	* gcc.target/i386/20040112-1.c: Add -fomit-frame-pointer and use
	check-function-bodies to check for loop.
	* gcc.target/i386/avx-ne-convert-1.c: Compile with
	-fno-fuse-ops-with-volatile-access.
	* gcc.target/i386/avx10_2-bf16-1.c: Likewise.
	* gcc.target/i386/avx10_2-convert-1.c: Likewise.
	* gcc.target/i386/avx10_2-satcvt-1.c: Likewise.
	* gcc.target/i386/avx512bf16-vcvtneps2bf16-1.c: Likewise.
	* gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c: Likewise.
	* gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c: Likewise.
	* gcc.target/i386/avx512bitalg-vpshufbitqmb.c: Likewise.
	* gcc.target/i386/avx512bw-vpcmpb-1.c: Likewise.
	* gcc.target/i386/avx512bw-vpcmpub-1.c: Likewise.
	* gcc.target/i386/avx512bw-vpcmpuw-1.c: Likewise.
	* gcc.target/i386/avx512bw-vpcmpw-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvtps2qq-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvtps2uqq-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvtqq2pd-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvtqq2ps-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvttps2qq-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvttps2uqq-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvtuqq2pd-1.c: Likewise.
	* gcc.target/i386/avx512dq-vcvtuqq2ps-1.c: Likewise.
	* gcc.target/i386/avx512dq-vextractf32x8-1.c: Likewise.
	* gcc.target/i386/avx512dq-vextractf64x2-1.c: Likewise.
	* gcc.target/i386/avx512dq-vextracti64x2-1.c: Likewise.
	* gcc.target/i386/avx512dq-vfpclasspd-1.c: Likewise.
	* gcc.target/i386/avx512dq-vfpclassps-1.c: Likewise.
	* gcc.target/i386/avx512dq-vfpclasssd-1.c: Likewise.
	* gcc.target/i386/avx512dq-vfpclassss-1.c: Likewise.
	* gcc.target/i386/avx512dq-vpmullq-1.c: Likewise.
	* gcc.target/i386/avx512dq-vpmullq-3.c: Likewise.
	* gcc.target/i386/avx512f-pr100267-1.c: Likewise.
	* gcc.target/i386/avx512f-vcmppd-1.c: Likewise.
	* gcc.target/i386/avx512f-vcmpps-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtps2pd-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtsd2si-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtsd2si64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtsd2usi-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtsd2usi64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtsi2ss-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtss2si-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtss2si64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtss2usi-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvtss2usi64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttsd2si-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttsd2si64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttsd2usi-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttsd2usi64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttss2si-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttss2si64-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttss2usi-1.c: Likewise.
	* gcc.target/i386/avx512f-vcvttss2usi64-1.c: Likewise.
	* gcc.target/i386/avx512f-vextractf32x4-1.c: Likewise.
	* gcc.target/i386/avx512f-vextractf64x4-1.c: Likewise.
	* gcc.target/i386/avx512f-vextracti64x4-1.c: Likewise.
	* gcc.target/i386/avx512f-vmovapd-1.c: Likewise.
	* gcc.target/i386/avx512f-vmovaps-1.c: Likewise.
	* gcc.target/i386/avx512f-vmovdqa64-1.c: Likewise.
	* gcc.target/i386/avx512f-vpandnq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpbroadcastd-1.c: Likewise.
	* gcc.target/i386/avx512f-vpbroadcastq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpd-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpeqq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpequq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpged-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpgeq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpgeud-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpgeuq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpled-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpleq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpleud-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpleuq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpltd-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpltq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpltud-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpltuq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpneqd-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpneqq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpnequd-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpnequq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpq-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpud-1.c: Likewise.
	* gcc.target/i386/avx512f-vpcmpuq-1.c: Likewise.
	* gcc.target/i386/avx512f-vrndscalepd-1.c: Likewise.
	* gcc.target/i386/avx512f-vrndscaleps-1.c: Likewise.
	* gcc.target/i386/avx512fp16-complex-fma.c: Likewise.
	* gcc.target/i386/avx512fp16-vaddph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2dq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2pd-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2psx-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2qq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2udq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2uw-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtph2w-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtps2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvttph2dq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvttph2qq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvttph2udq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvttph2uw-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvttph2w-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vfmulcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vfpclassph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vfpclasssh-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vmulph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vrcpph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vrsqrtph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16-vsqrtph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vfpclassph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vrcpph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c: Likewise.
	* gcc.target/i386/avx512fp16vl-vsqrtph-1a.c: Likewise.
	* gcc.target/i386/avx512vl-pr100267-1.c: Likewise.
	* gcc.target/i386/avx512vl-vcmppd-1.c: Likewise.
	* gcc.target/i386/avx512vl-vcmpps-1.c: Likewise.
	* gcc.target/i386/avx512vl-vcvtpd2ps-1.c: Likewise.
	* gcc.target/i386/avx512vl-vcvtpd2udq-1.c: Likewise.
	* gcc.target/i386/avx512vl-vcvttpd2udq-1.c: Likewise.
	* gcc.target/i386/avx512vl-vcvttps2udq-1.c: Likewise.
	* gcc.target/i386/avx512vl-vextractf32x4-1.c: Likewise.
	* gcc.target/i386/avx512vl-vmovapd-1.c: Likewise.
	* gcc.target/i386/avx512vl-vmovaps-1.c: Likewise.
	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Likewise.
	* gcc.target/i386/avx512vl-vpcmpd-1.c: Likewise.
	* gcc.target/i386/avx512vl-vpcmpeqq-1.c: Likewise.
	* gcc.target/i386/avx512vl-vpcmpequq-1.c: Likewise.
	* gcc.target/i386/avx512vl-vpcmpq-1.c: Likewise.
	* gcc.target/i386/avx512vl-vpcmpud-1.c: Likewise.
	* gcc.target/i386/avx512vl-vpcmpuq-1.c: Likewise.
	* gcc.target/i386/pr122343-1a.c: New test.
	* gcc.target/i386/pr122343-1b.c: Likewise.
	* gcc.target/i386/pr122343-2a.c: Likewise.
	* gcc.target/i386/pr122343-2b.c: Likewise.
	* gcc.target/i386/pr122343-3.c: Likewise.
	* gcc.target/i386/pr122343-4a.c: Likewise.
	* gcc.target/i386/pr122343-4b.c: Likewise.
	* gcc.target/i386/pr122343-5a.c: Likewise.
	* gcc.target/i386/pr122343-5b.c: Likewise.
	* gcc.target/i386/pr122343-6a.c: Likewise.
	* gcc.target/i386/pr122343-6b.c: Likewise.
	* gcc.target/i386/pr122343-7.c: Likewise.

Signed-off-by: H.J. Lu <[email protected]>
Back in r78875 mrs added cpp_get_path/dir accessors for _cpp_file in order
to interface with the darwin framework system.  But now I notice that the
latter duplicates the better-named _cpp_get_file_dir, and I'm inclined to
rename the former to match.

Perhaps we should drop the initial underscore since these are no
longer internal interfaces; OTOH, _cpp_hashnode_value and
_cpp_backup_tokens still have the initial underscore in cpplib.h.

libcpp/ChangeLog:

	* include/cpplib.h (cpp_get_path, cpp_get_dir): Remove.
	(_cpp_get_file_path, _cpp_get_file_name, _cpp_get_file_stat)
	(_cpp_get_file_dir): Move prototypes from...
	* internal.h: ...here.
	* files.cc (_cpp_get_file_path): Rename from...
	(cpp_get_path): ...this.
	(cpp_get_dir): Remove.

gcc/ChangeLog:

	* config/darwin-c.cc (find_subframework_header): Use
	_cpp_get_file_*.
gcc/analyzer/ChangeLog:
	* kf.cc (register_known_functions): Remove duplicate calls to
	register_atomic_builtins and register_varargs_builtins.

Signed-off-by: David Malcolm <[email protected]>
This was reported as a regression in GCC 14: the compiler resolves
Accum_Type to Positive for a reduction expression whose "expected
subtype" is Positive, which means that 0 cannot be used as initial
value in the expression:

  Sum : Positive := V'Reduce ("+", 0);

without always raising Constraint_Error as run time.  That's not the
intent according to T. Taft in
  https://forum.ada-lang.io/t/regression-in-gnat-14/890
so this changes the resolution to use the base type (Integer) instead.

gcc/ada/
	PR ada/115349
	* sem_attr.adb (Resolve_Attribute) <Attribute_Reduce>: Use the base
	type as Accum_Type if the reducer is an operator from Standard and
	the type is numeric.  Use the type of the first operand for other
	operators.  Streamline the error message given for limited types.

gcc/testsuite/
	* gnat.dg/reduce3.adb: New test.
Don't allow 2 volatile memory references in *<avx512>_cmp<mode>3_dup_op
so that gcc.target/i386/avx2-vpcmpeqq-1.c will generate 2 loads when
-march=cascadelake is used.

	PR target/122343
	* config/i386/sse.md (*<avx512>_cmp<mode>3_dup_op): Don't allow
	2 volatile memory references.

Signed-off-by: H.J. Lu <[email protected]>
When -march=cascadelake is added, we generate

	vmovdqa	x(%rip), %ymm0
	vpcmpq	$1, x(%rip), %ymm0, %k0
	vpmovm2q	%k0, %ymm0
	vmovdqa	%ymm0, x(%rip)

instead of

	vmovdqa	x(%rip), %ymm1
	vmovdqa	x(%rip), %ymm0
	vpcmpgtq	%ymm1, %ymm0, %ymm0
	vmovdqa	%ymm0, x(%rip)

Compile avx2-vpcmpgtq-1.c with -fno-fuse-ops-with-volatile-access to
generate vpcmpgtq instead of vpcmpq.

	PR target/122343
	* gcc.target/i386/avx2-vpcmpgtq-1.c: Compile with
	-fno-fuse-ops-with-volatile-access.

Signed-off-by: H.J. Lu <[email protected]>
…d [PR122868]

As Richi suggested this moves the check into the loop so we check every load.

I had initially not done this because I figured the loads would be treated as a
group anyway and the group would be valid or not as a whole.  But for invariants
they could be a group, but not all the loads within range of a known bounds.

gcc/ChangeLog:

	PR tree-optimization/122868
	* tree-vect-stmts.cc (vectorizable_load): Move check for invariant loads
	down into the loop.
The Adv. SIMD boolean reduction patterns were accidentally
overriding one of the input arguments.  This fixes it and
removes unneeded intermediate moves around the subreg type
castings.

gcc/ChangeLog:

	PR target/123026
	* config/aarch64/aarch64-simd.md (reduc_sbool_ior_scal_<mode>,
	reduc_sbool_and_scal_<mode>): Fix tmp operands[1] override.

gcc/testsuite/ChangeLog:

	PR target/123026
	* gcc.target/aarch64/pr123026.c: New test.
rguenth and others added 9 commits December 16, 2025 08:30
When we have a speculated edge but we folded the call to
__builtin_unreachable () then trying to update the cgraph ICEs
in resolve_speculation because there's no symtab node for
__builtin_unreachable (). Reject this resolving attempt similar
as to when the callees decl were NULL or it were not semantically
equivalent.

I only have a LTRANS unit as testcase.

	PR ipa/122456
	* cgraph.cc (cgraph_edge::resolve_speculation): Handle
	a NULL symtab_node::get (callee_decl).
…ling

gcc/Changelog

	* haifa-sched.cc (choose_ready): Don't require dfa_lookahead <= 0
	to schedule SCHED_GROUP_P insns first.
This patch enables dispatch scheduling for the NVIDIA Olympus core.
The dispatch constraints are based on the Olympus CPU Core Software
Optimization Guide
(https://docs.nvidia.com/olympus-cpu-core-software-optimization-guide-dp12531-001v0-7.pdf).

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for trunk?

Signed-off-by: Jennifer Schmitz <[email protected]>

gcc/
	* config/aarch64/aarch64.md: Include olympus.md.
	* config/aarch64/olympus.md: New file.
	* config/aarch64/tuning_models/olympus.h: Add dispatch
	constraints and enable dispatch scheduling.
Add a new target instruction. Hardware-assisted sanitizers on
architectures providing instructions to tag/untag memory can then
make use of this new instruction pattern. For example, the
memtag-stack sanitizer uses these instructions to tag and untag a
memory granule.

	gcc/
	* target-insns.def (tag_memory): New target instruction.
	* doc/md.texi (tag_memory): Add documentation.

Signed-off-by: Claudiu Zissulescu <[email protected]>
Add a new target instruction used by hardware-assisted sanitizers on
architectures providing memory-tagging instructions. This instruction
is used to compute assign tags at a fixed offset from a tagged address
base. For example, in AArch64 case, this pattern instantiate `addg`
instruction.

	gcc/
	* target-insns.def (compose_tag): New target instruction.
	* doc/md.texi (compose_tag): Add documentation.

Signed-off-by: Claudiu Zissulescu <[email protected]>
Add new command line option -fsanitize=memtag-stack with the following
new params:
--param memtag-instrument-alloca [0,1] (default 1) to use MTE insns
for enabling dynamic checking of stack allocas.

Along with the new SANITIZE_MEMTAG_STACK, define a SANITIZE_MEMTAG
which will be set if any kind of memtag sanitizer is in effect (e.g.,
later we may add -fsanitize=memtag-globals).  Add errors to convey
that memtag sanitizer does not work with hwaddress and address
sanitizers.  Also error out if memtag ISA extension is not enabled.

MEMTAG sanitizer will use the HWASAN machinery, but with a few
differences:
  - The tags are always generated at runtime by the hardware, so
    -fsanitize=memtag-stack enforces a --param hwasan-random-frame-tag=1

Add documentation in gcc/doc/invoke.texi.

gcc/
	* builtins.def: Adjust the macro to include the new
	SANTIZIE_MEMTAG_STACK.
	* flag-types.h (enum sanitize_code): Add new enumerator for
	SANITIZE_MEMTAG and SANITIZE_MEMTAG_STACK.
	* opts.cc (finish_options): memtag-stack sanitizer conflicts with
	hwaddress and address sanitizers.
	(sanitizer_opts): Add new memtag-stack sanitizer.
	(parse_sanitizer_options): memtag-stack sanitizer cannot recover.
	* params.opt: Add new params for memtag-stack sanitizer.
	* doc/invoke.texi: Update documentation.

Signed-off-by: Claudiu Zissulescu <[email protected]>
Co-authored-by: Claudiu Zissulescu <[email protected]>
Memory tagging is used for detecting memory safety bugs.  On AArch64, the
memory tagging extension (MTE) helps in reducing the overheads of memory
tagging:
 - CPU: MTE instructions for efficiently tagging and untagging memory.
 - Memory: New memory type, Normal Tagged Memory, added to the Arm
   Architecture.

The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
HWASAN.  MEMTAG and HWASAN are both hardware-assisted solutions, and
rely on the same sanitizer machinery in parts.  So, define new
constructs that allow MEMTAG and HWASAN to share the infrastructure:

  - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
    SANITIZE_HWASAN is true.
  - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
    stack variables are to be sanitized.

MEMTAG and HWASAN do have differences, however, and hence, the need to
conditionalize using memtag_sanitize_p () in the relevant places. E.g.,

  - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
    to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
    memory.  Similar approach can be seen for handling
    handle_builtin_alloca, where instead of doing the gimple
    transformations, target hooks are used.

  - Add a new internal function HWASAN_ALLOCA_POISON to handle
    dynamically allocated stack when MEMTAG sanitizer is enabled. At
    expansion, this allows to, in turn, invoke target-hooks to increment
    tag, and use the generated tag to finally tag the dynamically allocated
    memory.

    The usual pattern:
        irg     x0, x0, x0
        subg    x0, x0, #16, #0
    creates a tag in x0 and so on.  For alloca, we need to apply the
    generated tag to the new sp.  In absense of an extract tag insn, the
    implemenation in expand_HWASAN_ALLOCA_POISON resorts to invoking irg
    again.

gcc/
	* asan.cc (handle_builtin_stack_restore): Accommodate MEMTAG
	sanitizer.
	(handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
	(get_mem_refs_of_builtin_call): Include MEMTAG along with
	HWASAN.
	(memtag_sanitize_stack_p): New definition.
	(memtag_sanitize_allocas_p): Likewise.
	(memtag_memintrin): Likewise.
	(hwassist_sanitize_p): Likewise.
	(hwassist_sanitize_stack_p): Likewise.
	(report_error_func): Include MEMTAG along with HWASAN.
	(build_check_stmt): Likewise.
	(instrument_derefs): MEMTAG too does not deal with globals yet.
	(instrument_builtin_call): Include MEMTAG along with HWASAN.
	(maybe_instrument_call): Likewise.
	(asan_expand_mark_ifn): Likewise.
	(asan_expand_check_ifn): Likewise.
	(asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
	(asan_instrument): Include MEMTAG along with HWASAN.
	(hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
	(hwasan_emit_untag_frame): Likewise.
	* asan.h (memtag_sanitize_stack_p): New declaration.
	(memtag_sanitize_allocas_p): Likewise.
	(hwassist_sanitize_p): Likewise.
	(hwassist_sanitize_stack_p): Likewise.
	(asan_sanitize_use_after_scope): Include MEMTAG along with
	HWASAN.
	* cfgexpand.cc (align_local_variable): Likewise.
	(expand_one_stack_var_at): Likewise.
	(expand_stack_vars): Likewise.
	(expand_one_stack_var_1): Likewise.
	(init_vars_expansion): Likewise.
	(expand_used_vars): Likewise.
	(pass_expand::execute): Likewise.
	* gimplify.cc (asan_poison_variable): Likewise.
	* internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition.
	(expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG
	sanitizer.
	(expand_HWASAN_MARK): Likewise.
	* internal-fn.def (HWASAN_ALLOCA_POISON): Define new.
	* params.opt: Document new param.
	* sanopt.cc (pass_sanopt::execute): Include MEMTAG along with
	HWASAN.
	* gcc.cc (sanitize_spec_function): Add check for memtag-stack.
	* doc/tm.texi: Regenerate.
	* target.def (extract_tag): Update documentation.
	(add_tag): Likewise.
	(insert_random_tag): Likewise.

Co-authored-by: Indu Bhagat <[email protected]>
Signed-off-by: Claudiu Zissulescu <[email protected]>
MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.

Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect.  Continue to use the default target hook if HWASAN is
being used.  Following target hooks are implemented:
   - TARGET_MEMTAG_INSERT_RANDOM_TAG
   - TARGET_MEMTAG_ADD_TAG
   - TARGET_MEMTAG_EXTRACT_TAG

Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
   - TARGET_MEMTAG_TAG_BITSIZE
   - TARGET_MEMTAG_GRANULE_SIZE

The next instructions were (re-)defined:
   - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
     TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
   - stg/st2g Used to tag/untag a memory granule.
   - tag_memory A target specific instruction, it will will emit MTE
     instructions to tag/untag memory of a given size.
   - compose_tag A target specific instruction that computes a tagged
     address as an offset from a base (tagged) address.
   - gmi Used for randomizing the inserting tag.
   - irg Likewise.

gcc/

	* config/aarch64/aarch64.md (addg): Update pattern to use
	addg/subg instructions.
	(stg): Update pattern.
	(st2g): New pattern.
	(tag_memory): Likewise.
	(compose_tag): Likewise.
	(irq): Update pattern to accept xzr register.
	(gmi): Likewise.
	(UNSPECV_TAG_SPACE): Define.
	* config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
	Define.
	(AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
	(aarch64_override_options_internal): Error out if MTE instructions
	are not available.
	(aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
	(aarch64_can_tag_addresses): Add MEMTAG specific handling.
	(aarch64_memtag_tag_bitsize): New function
	(aarch64_memtag_granule_size): Likewise.
	(aarch64_memtag_insert_random_tag): Likwise.
	(aarch64_memtag_add_tag): Likewise.
	(aarch64_memtag_extract_tag): Likewise.
	(aarch64_granule16_memory_address_p): Likewise.
	(aarch64_emit_stxg_insn): Likewise.
	(aarch64_memtag_tag_memory_via_loop): New definition.
	(aarch64_expand_tag_memory): Likewise.
	(aarch64_check_memtag_ops): Likewise.
	(TARGET_MEMTAG_TAG_BITSIZE): Likewise.
	(TARGET_MEMTAG_GRANULE_SIZE): Likewise.
	(TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
	(TARGET_MEMTAG_ADD_TAG): Likewise.
	(TARGET_MEMTAG_EXTRACT_TAG): Likewise.
	* config/aarch64/aarch64-builtins.cc
	(aarch64_expand_builtin_memtag): Update set tag builtin logic.
	* config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
	specific options to the linker.
	* config/aarch64/aarch64-protos.h
	(aarch64_granule16_memory_address_p): New prototype.
	(aarch64_check_memtag_ops): Likewise.
	(aarch64_expand_tag_memory): Likewise.
	* config/aarch64/constraints.md (Umg): New memory constraint.
	(Uag): New constraint.
	(Ung): Likewise.
	* config/aarch64/predicates.md (aarch64_memtag_tag_offset):
	Refactor it.
	(aarch64_granule16_imm6): Rename from aarch64_granule16_uimm6 and
	refactor it.
	(aarch64_granule16_memory_operand): New constraint.
	* config/aarch64/iterators.md (MTE_PP): New code iterator to be
	used for mte instructions.
	(stg_ops): New code attributes.
	(st2g_ops): Likewise.
	(mte_name): Likewise.
	* config/aarch64/aarch64.opt (aarch64-tag-memory-loop-threshold):
	New parameter.
	* doc/invoke.texi: Update documentation.

gcc/testsuite:

	* gcc.target/aarch64/acle/memtag_1.c: Update test.

Co-authored-by: Indu Bhagat <[email protected]>
Signed-off-by: Claudiu Zissulescu <[email protected]>
Add basic tests for memtag-stack sanitizer.  Memtag stack sanitizer
uses target hooks to emit AArch64 specific MTE instructions.

gcc/testsuite:

	* gcc.target/aarch64/memtag/alloca-1.c: New test.
	* gcc.target/aarch64/memtag/alloca-2.c: New test.
	* gcc.target/aarch64/memtag/alloca-3.c: New test.
	* gcc.target/aarch64/memtag/arguments-1.c: New test.
	* gcc.target/aarch64/memtag/arguments-2.c: New test.
	* gcc.target/aarch64/memtag/arguments-3.c: New test.
	* gcc.target/aarch64/memtag/arguments-4.c: New test.
	* gcc.target/aarch64/memtag/arguments.c: New test.
	* gcc.target/aarch64/memtag/basic-1.c: New test.
	* gcc.target/aarch64/memtag/basic-3.c: New test.
	* gcc.target/aarch64/memtag/basic-struct.c: New test.
	* gcc.target/aarch64/memtag/large-array.c: New test.
	* gcc.target/aarch64/memtag/local-no-escape.c: New test.
	* gcc.target/aarch64/memtag/memtag.exp: New file.
	* gcc.target/aarch64/memtag/no-sanitize-attribute.c: New test.
	* gcc.target/aarch64/memtag/value-init.c: New test.
	* gcc.target/aarch64/memtag/vararray-gimple.c: New test.
	* gcc.target/aarch64/memtag/vararray.c: New test.
	* gcc.target/aarch64/memtag/zero-init.c: New test.
	* gcc.target/aarch64/memtag/texec-1.c: New test.
	* gcc.target/aarch64/memtag/texec-2.c: New test.
	* gcc.target/aarch64/memtag/texec-3.c: New test.
	* gcc.target/aarch64/memtag/vla-1.c: New test.
	* gcc.target/aarch64/memtag/vla-2.c: New test.
	* lib/target-supports.exp (check_effective_target_aarch64_mte):
	New function.

Co-authored-by: Indu Bhagat <[email protected]>
Signed-off-by: Claudiu Zissulescu <[email protected]>
Comment on lines +4613 to +4615
emit_insn (gen_mulsi3 (operands[4], operands[1], operands[2]));
emit_insn (gen_addsi3 (operands[0], operands[3], operands[4]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is also wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting with this fix b4ce3f9 I get a regression of 0.5%

@luismgsilva luismgsilva force-pushed the michiel/fusion-trunk-3 branch from 89415d4 to b4ce3f9 Compare December 16, 2025 11:34
rorth and others added 8 commits December 16, 2025 13:02
…s with as

The gcc.target/i386/shift-gf2p8affine-2.c test FAILs on Solaris with the
native assembler:

FAIL: gcc.target/i386/shift-gf2p8affine-2.c (test for excess errors)
UNRESOLVED: gcc.target/i386/shift-gf2p8affine-2.c compilation failed to produce executable

Excess errors:
Assembler: shift-gf2p8affine-2.c
        "/var/tmp//ccZMQ1Ad.s", line 30 : Illegal mnemonic
        Near line: "    vgf2p8affineqb  $0, %zmm1, %zmm0, %zmm0"
        "/var/tmp//ccZMQ1Ad.s", line 30 : Syntax error

Thus this patch only runs the test when gas is in use.

Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.

2025-12-15  Rainer Orth  <[email protected]>

	gcc/testsuite:
	* gcc.target/i386/shift-gf2p8affine-2.c: Skip on Solaris
	without gas.
The following works around SRA not being able to decompose an
aggregate copy of std::complex because with x87 math ld/st pairs
are not bit-preserving by adding -msse -mfpmath=sse.  This avoids
spurious failures of the testcase.

	PR testsuite/123137
	* g++.dg/vect/pr64410.cc: Add -mfpmath=sse -msse on x86.
As a result of the automatic replacement by commit 4dd1398,
there are several code fragments that receive the return value of
end_sequence() and immediately use it as the return value of the function
itself.

   rtx_insn *insn;
   ...
   insn = end_sequence ();
   return insn;

It is clear that in such cases, it would be more natural to pass the return
value of end_sequence() directly to the return statement without passing it
through a variable.

Applying this patch naturally does not change any functionality.

gcc/ChangeLog:

	* config/xtensa/xtensa.cc
	(xtensa_expand_block_set_libcall,
	xtensa_expand_block_set_unrolled_loop,
	xtensa_expand_block_set_small_loop, xtensa_call_tls_desc):
	Change the return statement to pass the return value of
	end_sequence() directly without going through a variable, and
	remove the definition of that variable.
In the expansion of cstoresi4 insn patterns, LT[U] comparisons where the
second operand is an integer constant are canonicalized to LE[U] ones with
one less than the original.

     /* example */
     int test0(int a) {
       return a < 100;
     }
     unsigned int test1(unsigned int a) {
       return a <= 100u;
     }
     void test2(int a[], int b) {
       int i;
       for (i = 0; i < 16; ++i)
	a[i] = (a[i] <= b);
     }

     ;; before (TARGET_SALT)
     test0:
     	entry	sp, 32
     	movi	a8, 0x63
     	salt	a2, a8, a2
     	addi.n	a2, a2, -1	;; unwanted inverting
     	neg	a2, a2		;;
     	retw.n
     test1:
     	entry	sp, 32
     	movi	a8, 0x64
     	saltu	a2, a8, a2
     	addi.n	a2, a2, -1	;; unwanted inverting
     	neg	a2, a2		;;
     	retw.n
     test2:
     	entry	sp, 32
     	movi.n	a9, 0x10
     	loop	a9, .L5_LEND
     .L5:
     	l32i.n	a8, a2, 0
     	salt	a8, a3, a8
     	addi.n	a8, a8, -1	;; immediate cannot be hoisted out
     	neg	a8, a8
     	s32i.n	a8, a2, 0
     	addi.n	a2, a2, 4
     	.L5_LEND:
     	retw.n

This patch reverts such canonicalization by adding 1 to the comparison value
and then converting it back from LE[U] to LT[U], which better matches the
output machine instructions.  This patch also makes it easier to benefit
from other optimizations such as CSE, constant propagation, or loop-invariant
hoisting by XORing the result with a register that has a value of 1, rather
than subtracting 1 and then negating the sign to invert the truth of the
result.

     ;; after (TARGET_SALT)
     test0:
     	entry	sp, 32
     	movi	a8, 0x64
     	salt	a2, a2, a8
     	retw.n
     test1:
     	entry	sp, 32
     	movi	a8, 0x65
     	saltu	a2, a2, a8
     	retw.n
     test2:
     	entry	sp, 32
     	movi.n	a10, 1		;; hoisted out
     	movi.n	a9, 0x10
     	loop	a9, .L5_LEND
     .L5:
     	l32i.n	a8, a2, 0
     	salt	a8, a3, a8
     	xor	a8, a8, a10
     	s32i.n	a8, a2, 0
     	addi.n	a2, a2, 4
     	.L5_LEND:
     	retw.n

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (xtensa_expand_scc_SALT):
	New sub-function that emits the SALT/SALTU instructions.
	(xtensa_expand_scc): Change the part related to the SALT/SALTU
	instructions to a call to the above sub-function.
Signed-off-by: Mohammad-Reza Nabipoor <[email protected]>

gcc/algol68/ChangeLog

	* a68-imports.cc (dump_encoded_mode): Replace "basic" with
	"string".
This commit introduces two new utility functions that replace some
ad-hoc infrastructure in the scanner.

Signed-off-by: Jose E. Marchesi <[email protected]>

gcc/algol68/ChangeLog

	* a68.h: Prototypes for a68_get_file_size and a68_file_read.
	* a68-parser-scanner.cc (a68_file_size): New function.
	(a68_file_read): Renamed from io_read.
	(get_source_size): Deleted function.
	(include_files): Use a68_file_size and a68_file_read.
This commit adds support for two new command-line options for the
Algol 68 front-end:

  -fmodules-map=<string>
  -fmodules-map-file=<filename>

These options are used in order to specify a mapping from module
indicants to file basenames.  The compiler will base its search for
the modules on these basenames rather on the default schema of
deriving the basename from the module indicant.

Signed-off-by: Jose E. Marchesi <[email protected]>

gcc/algol68/ChangeLog

	* lang.opt (-fmodules-map): New option.
	(-fmodules-map-file): Likewise.
	* a68.h: Add prototype for a68_process_module_map.
	* a68-imports.cc (SKIP_WHITESPACES): Define.
	(PARSE_BASENAME): Likewise.
	(PARSE_INDICANT): Likewise.
	(a68_process_module_map): New function.
	* a68-lang.cc: (a68_init): Move initialization of
	A68_MODULE_FILES from there...
	(a68_init_options): to here.
	(a68_handle_option): Handle OPT_fmodules_map and
	OPT_fmodules_map_.
	* a68-parser-pragmat.cc (handle_access_in_pragmat): Normalize
	module indicants to upper case.
	* ga68.texi (Module search options): New section.
Signed-off-by: Jose E. Marchesi <[email protected]>

gcc/ChangeLog

	* common.opt.urls: Regenerate.

gcc/algol68/ChangeLog

	* lang.opt.urls: Regenerate.
@luismgsilva luismgsilva force-pushed the michiel/fusion-trunk-3 branch 4 times, most recently from 10f17dd to a33ab39 Compare December 16, 2025 14:15
artemiy-volkov and others added 6 commits December 16, 2025 15:28
This patch introduces the pipeline description for the Synopsys RMX-100
series processor to the RISC-V GCC backend.  The RMX-100 has a short,
three-stage, in-order execution pipeline with configurable multiply
unit options.

The option -mmpy-option was added to control which version of the MPY
unit the core has and what the latency of multiply instructions should
be similar to ARCv2 cores (see gcc/config/arc/arc.opt:60).

gcc/ChangeLog:

        * config/riscv/riscv-cores.def (RISCV_TUNE): Add
	  arc-v-rmx-100-series.
        * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
	  Add arcv_rmx100.
        (enum arcv_mpy_option_enum): New enum for ARC-V multiply
	options.
        * config/riscv/riscv-protos.h (arcv_mpy_1c_bypass_p): New
	  declaration.
        (arcv_mpy_2c_bypass_p): New declaration.
        (arcv_mpy_10c_bypass_p): New declaration.
        * config/riscv/riscv.cc (arcv_mpy_1c_bypass_p): New function.
        (arcv_mpy_2c_bypass_p): New function.
        (arcv_mpy_10c_bypass_p): New function.
        * config/riscv/riscv.md: Add arcv_rmx100.
        * config/riscv/riscv.opt: New option for RMX-100 multiply unit
	  configuration
        * doc/riscv-mtune.texi: Document arc-v-rmx-100-series.
        * config/riscv/arcv-rmx100.md: New file.

Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
This patch introduces the pipeline description for the Synopsys
RHX-100 series processor to the RISC-V GCC backend.  The RHX-100
features a 10-stage, dual-issue, in-order execution pipeline
architecture.

It has support for instruction fusion, which will be addressed by
subsequent patches.

gcc/ChangeLog:

        * config/riscv/riscv-cores.def (RISCV_TUNE): Add
	  arc-v-rhx-100-series.
        * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
	  Add arcv_rhx100.
        * config/riscv/riscv.cc (enum riscv_fusion_pairs): Add
	  RISCV_FUSE_ARCV.
        * config/riscv/riscv.md: Add arcv_rhx100 to tune attribute.
        * doc/riscv-mtune.texi: Add RHX-100 documentation.
        * config/riscv/arcv-rhx100.md: New file.

Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
This patch implements instruction fusion support for the Synopsys
RHX-100 processor by adding the arcv_macro_fusion_pair_p function
and supporting infrastructure.

The implementation supports fusion of several instruction patterns:
multiply-add sequences, shift-based bit extraction, load-immediate with
conditional branches, adjacent memory operations, memory operations with
arithmetic instructions, memory operations with LUI instructions, and
load-immediate with store operations.

A new arcv.cc file is added to contain ARC-V specific optimizations,
and the existing multiply bypass functions are moved from riscv.cc to
this new file for better organization.

gcc/ChangeLog:

        * config.gcc: Add arcv.o to extra_objs.
        * config/riscv/riscv-protos.h (arcv_macro_fusion_pair_p): New
	  declaration.
        (arcv_sched_fusion_priority): New declaration.
        (arcv_can_issue_more_p): New declaration.
        (arcv_sched_variable_issue): New declaration.
        (arcv_sched_init): New declaration.
        (arcv_sched_reorder2): New declaration.
        (arcv_sched_adjust_priority): New declaration.
        (arcv_sched_adjust_cost): New declaration.
        * config/riscv/riscv.cc (arcv_mpy_1c_bypass_p): Move to arcv.cc
        (arcv_mpy_2c_bypass_p): Move to arcv.cc
        (arcv_mpy_10c_bypass_p): Move to arcv.cc
        (riscv_macro_fusion_pair_p): New function.
        * config/riscv/t-riscv: Add arcv.o build rule.
        * config/riscv/arcv.cc: New file.

Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
…eries.

This patch implements the TARGET_SCHED_FUSION_PRIORITY hook for the
Synopsys RHX-100 processor to improve instruction scheduling by
prioritizing fusible memory operations.

The implementation analyzes load and store instructions to extract
base registers and offsets, then assigns scheduling priorities based
on several factors: access width (wider accesses get higher priority),
base register number, and memory offset values.  Instructions with
adjacent addresses are grouped together to enable better fusion
opportunities.

gcc/ChangeLog:

        * config/riscv/arcv.cc (arcv_fusion_load_store): New function.
        (arcv_sched_fusion_priority): New function.
        * config/riscv/riscv.cc (riscv_sched_fusion_priority): New
	  function.
        (TARGET_SCHED_FUSION_PRIORITY): Define hook.

Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
This patch implements instruction scheduling support for the
dual-issue Synopsys RHX-100 processor by adding scheduler hooks and
state tracking for the two execution pipes.

The implementation tracks ALU pipe and memory pipe usage to maximize
dual-issue opportunities.  It includes reordering logic to promote
fusion of adjacent memory operations and other instruction pairs that
can execute simultaneously on the RHX-100's dual-issue architecture.

The scheduler prioritizes fused instruction pairs and adjusts costs
to improve scheduling decisions.  Memory operations are directed to
the appropriate pipe while arithmetic operations utilize the ALU pipe,
enabling optimal utilization of both execution units.

New TARGET_SCHED hooks are implemented including ADJUST_PRIORITY,
REORDER2, and enhanced VARIABLE_ISSUE handling specifically for
the RHX-100 microarchitecture.

gcc/ChangeLog:

        * config/riscv/arcv.cc (struct arcv_sched_state): New struct.
        (arcv_sched_init): New function.
        (arcv_sched_reorder2): New function.
        (arcv_sched_adjust_priority): New function.
        (arcv_sched_adjust_cost): New function.
        (arcv_can_issue_more_p): New function.
        (arcv_sched_variable_issue): New function.
        * config/riscv/riscv.cc (riscv_fusion_enabled_p): Add forward
	  declaration.
        (riscv_sched_init): Add call to arcv_shed_init.
        (riscv_sched_variable_issue): Add ARC-V-specific handling.
        (riscv_sched_adjust_cost): Add ARC-V-specific cost adjustment
	and fix parameter names.
        (riscv_sched_adjust_priority): New function.
        (riscv_sched_reorder2): New function.
        (TARGET_SCHED_ADJUST_PRIORITY): Define hook.
        (TARGET_SCHED_REORDER2): Define hook.
        * config/riscv/riscv.h (TARGET_ARCV_RHX100): New macro.

Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Co-authored-by: Alex Turjan <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
…axt fusion.

This patch adds instruction patterns to support fusion of multiply-add
sequences and bit extraction operations for the Synopsys RHX-100 processor.

The multiply-add fusion supports both signed and unsigned 16-bit operands
expanded to 32-bit multiply-accumulate operations.  The implementation
generates separate multiply and add instructions that can be fused by
the processor hardware.

The bit extraction fusion implements zero extraction using shift-left
followed by shift-right operations, which can be fused into a single
micro-operation.  New instruction types "imul_fused" and "alu_fused"
are added to the scheduling model to handle these fused operations.

Test cases are included to verify the correct generation of fusible
instruction sequences for multiply-add, bit extraction, and
load-immediate with conditional branch patterns.

gcc/ChangeLog:

        * config/riscv/arcv-rhx100.md (arcv_rhx100_imul_fused): New
	  reservation.
	(arcv_rhx100_alu_fused): New reservation.
        * config/riscv/iterators.md (is_zero_extract): New code
	  attribute.
        * config/riscv/riscv.cc (riscv_rtx_costs): Add
	  TARGET_ARCV_RHX100 support for SIGN_EXTRACT.
        * config/riscv/riscv.md: Add imul_fused and alu_fused to type
	  attribute.
	(umaddhisi4): New expand.
        (madd_split): New insn_and_split.
        (madd_split_extended): New insn_and_split.
        (*zero_extract_fused): New insn.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/arcv-fusion-limm-condbr.c: New test.
        * gcc.target/riscv/arcv-fusion-madd.c: New test.
        * gcc.target/riscv/arcv-fusion-xbfu.c: New test.

Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
@luismgsilva luismgsilva force-pushed the michiel/fusion-trunk-3 branch from a33ab39 to 1f24597 Compare December 16, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.