-
Notifications
You must be signed in to change notification settings - Fork 15
ARC-V RHX-100 upstream patch series #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
MichielDerhaeg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to split up the commits in something sensible. Didn't check whether they can be built individually though.
| case SIGN_EXTRACT: | ||
| if (TARGET_XTHEADBB && outer_code == SET | ||
| if ((TARGET_ARCV_RHX100 || TARGET_XTHEADBB) | ||
| && outer_code == SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this was added for the bit-extract fusion.
| (define_insn "*zero_extract_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=r") | ||
| (zero_extract:SI (match_operand:SI 1 "register_operand" "r") | ||
| (match_operand 2 "const_int_operand") | ||
| (match_operand 3 "const_int_operand")))] | ||
| "TARGET_ARCV_RHX100 && !TARGET_64BIT | ||
| && (INTVAL (operands[2]) > 1 || !TARGET_ZBS)" | ||
| { | ||
| int amount = INTVAL (operands[2]); | ||
| int end = INTVAL (operands[3]) + amount; | ||
| operands[2] = GEN_INT (BITS_PER_WORD - end); | ||
| operands[3] = GEN_INT (BITS_PER_WORD - amount); | ||
| return "slli\t%0,%1,%2\n\tsrli\t%0,%0,%3"; | ||
| } | ||
| [(set_attr "type" "alu_fused")] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this fusion was never implemented as a define_insn_and_split. Might not be trivial to force these exact instructions after a split.
libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h (__detail::__atomic_eq): Use std::addressof instead of &. * include/std/atomic (atomic::wait, atomic::notify_one) (atomic::notify_all): Likewise. Reviewed-by: Patrick Palka <[email protected]>
gcc/algol68/ChangeLog PR algol68/123007 * a68-lang.cc (a68_type_for_size): Handle intTI_type_node.
Implement the forwarding performed by std::bind via deducing this when available, instead of needing 4 operator() overloads. Using deducing this here is more complicated than in other standard call wrappers because std::bind is not really "perfect forwarding": it doesn't consider value category, and along with const-ness it also forwards volatile-ness (until C++20). The old implementation suffers from the same problem that other pre-C++23 SFINAE-friendly call wrappers have which is solved by using deducing this (see p5.5 of the deducing this paper P0847R7). PR libstdc++/80564 libstdc++-v3/ChangeLog: * include/std/functional (__cv_like): New. (_Bind::_Res_type): Don't define when not needed. (_Bind::__dependent): Likewise. (_Bind::_Res_type_cv): Likewise. (_Bind::operator()) [_GLIBCXX_EXPLICIT_THIS_PARAMETER]: Define as two instead of four overloads using deducing this. * testsuite/20_util/bind/cv_quals_2.cc: Ignore SFINAE diagnostics inside headers. * testsuite/20_util/bind/ref_neg.cc: Likewise. * testsuite/20_util/bind/80564.cc: New test. Reviewed-by: Tomasz Kamiński <[email protected]> Reviewed-by: Jonathan Wakely <[email protected]>
Starting with r16-4438-ga93f80feeef744, the edge sorting order was switched to lowest execution frequency first. But the "bbro" optimization pass chooses the first edge as a fallthrough. Thus the most unlikely branches were optimized to fallthroughs. Fix by restoring the sorting order prior to r16-4438-ga93f80feeef744. Now the branches most likely to be executed are picked as fallthroughs. There are no regressions for C and C++ on x86_64-pc-linux-gnu. The new tests fail for the respective targets without this patch, and pass with it. PR rtl-optimization/122675 gcc/ChangeLog: * bb-reorder.cc (edge_order): Fix BB edge ordering to be descending. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr122675-1.c: New test. * gcc.target/i386/pr122675-1.c: New test. * gcc.target/riscv/pr122675-1.c: New test. Signed-off-by: Dimitar Dimitrov <[email protected]>
…fix option From: Mark Zhuang <[email protected]> The previous commit added --default-prefix to handle non-default git prefix configurations, but this option is not available in older git versions. This patch adds a compatibility check. contrib/ChangeLog: * prepare-commit-msg: check --default-prefix
2025-12-06 Paul Thomas <[email protected]> gcc/fortran PR fortran/122578 * primary.cc (gfc_match_varspec): Try to resolve a typebound generic procedure selector expression to provide the associate name with a type. Also, resolve component calls. In both cases, make a copy of the selector expression to guard against changes made by gfc_resolve_expr. gcc/testsuite PR fortran/122578 * gfortran.dg/pdt_72.f03: New test.
2025-12-06 Paul Thomas <[email protected]> gcc/fortran PR fortran/122669 * resolve.cc (resolve_allocate_deallocate): Mold expressions with an array reference and a constant size must be resolved for each allocate object. gcc/testsuite PR fortran/122669 * gfortran.dg/pdt_73.f03: New test.
2025-12-06 Paul Thomas <[email protected]> gcc/fortran PR fortran/122670 * decl.cc (gfc_get_pdt_instance): Ensure that, in an interface body, PDT instances imported implicitly if the template has been explicitly imported. * module.cc (read_module): If a PDT template appears in a use only statement, implicitly add the instances as well. gcc/testsuite PR fortran/122670 * gfortran.dg/pdt_74.f03: New test.
2025-12-06 Paul Thomas <[email protected]> gcc/fortran PR fortran/122693 * array.cc (gfc_match_array_constructor): Stash and restore gfc_current_ns after the call to 'gfc_match_type_spec'. gcc/testsuite PR fortran/122693 * gfortran.dg/pdt_75.f03: New test.
This has been discussed in the 1/9 Reflection thread, but doesn't depend on
reglection in any way.
cp_parser_std_attribute calls lookup_attribute_spec as:
const attribute_spec *as
= lookup_attribute_spec (TREE_PURPOSE (attribute));
so with TREE_LIST where TREE_VALUE is attribute name and TREE_PURPOSE
attribute ns. Similarly c_parser_std_attribute. And for
attribute_takes_identifier_p those do:
else if (attr_ns == gnu_identifier
&& attribute_takes_identifier_p (attr_id))
and
bool takes_identifier
= (ns != NULL_TREE
&& strcmp (IDENTIFIER_POINTER (ns), "gnu") == 0
&& attribute_takes_identifier_p (name));
when handling std attributes (for GNU attributes they just call those
with the IDENTIFIER_NODE name.
is_late_template_attribute and tsubst_attribute pass to these functions
just get_attribute_name though, so handle attributes in all namespaces
as GNU attributes only, which means that lookup_attribute_spec can
return NULL or find a different attribute if it is not from gnu:: or
say standard attribute mapped to gnu::, or attribute_takes_identifier_p
can return true even for attributes for which it shouldn't.
I thought about changing attribute_takes_identifier_p to take optionally
TREE_LIST, but that would mean handling it in the target hooks too and
they only care about GNU attributes right now, so given the above
parser.cc/c-parser.cc snippets, the following patch just follow
what they do.
2025-12-06 Jakub Jelinek <[email protected]>
* decl2.cc (is_late_template_attribute): Call lookup_attribute_spec
on TREE_PURPOSE (attr) rather than name. Only call
attribute_takes_identifier_p if get_attribute_namespace (attr) is
gnu_identifier.
* pt.cc (tsubst_attribute): Only call attribute_takes_identifier_p
if get_attribute_namespace (t) is gnu_identifier.
This is another thing discussed in the 1/9 Reflection thread, also not dependent on reflection. decl_attributes calls simple_cst_equal on TREE_VALUEs of the current and preexisting attributes, but that is just a small part of how attribute values should be compared. The following patch fixes that. 2025-12-06 Jakub Jelinek <[email protected]> * attribs.cc (decl_attributes): Use attribute_value_equal to compare attribute values instead of simple_cst_equal.
compile-std1.C was breaking on arm-eabi because these interfaces aren't declared. So for exporting let's check the same macros that control declaring them. libstdc++-v3/ChangeLog: * src/c++23/std.cc.in: Add more #if.
2025-12-06 Paul Thomas <[email protected]> gcc/testsuite PR fortran/103414 * gfortran.dg/pdt_76.f03: New test.
Just a minor update to Dimitar's patch for the RISC-V testcase. The cfi directives are not emitted for the -elf configurations causing the new test to fail. The cfi directives (and associated labels) don't seem relevant to the test at hand, so this just drops them. Pushing to the trunk. PR rtl-optimization/122675 gcc/testsuite * gcc.target/riscv/pr122675-1.c: Adjust expected output.
If the reducer is a function and the accumulator type isn't constrained, at runtime the reduction will likely raise a Constraint_Error since the reducer is repeatedly assigned to the accumulator variable (likely changing its length). However, if the reducer is a procedure, no such assignment occurs, and thus the runtime error only depends on the reducer logic. This patch prevents the spurious warning in that case. gcc/ada/ * sem_attr.adb (Resolve_Attribute): Check if the reducer is a procedure before giving the warning.
When computing an address plus a large offset on riscv64 with a PC-relative sequence, we may hit the range limit for auipc and get a relocation overflow, where on riscv32 the computation wraps around. Since -mcmodel=medany requires the entire program to fit in a 2GiB address range, a +/-1GiB+ offset added to an in-range symbol in a barely-fitting program is more likely than not to be out-of-range. Since such large constants are unlikely to come up by chance, separate them from the symbol so as to avoid the relocation overflow. for gcc/ChangeLog PR target/91420 * config/riscv/riscv.cc (riscv_symbolic_constant_p): Require offsets smaller than +/- 1GiB for PCREL symbols. for gcc/testsuite/ChangeLog PR target/91420 * gcc.target/riscv/pr91420.c: New.
Since we may delete stores that are found to be redundant in postreload cse, we need cselib to invalidate argument stores at calls, and to that end we need CALL_INSN_FUNCTION_USAGE to mention all MEM stack space that may be legitimately modified by a const/pure callee, i.e., all arguments passed to it on the stack. When ACCUMULATE_OUTGOING_ARGS, each on-stack argument gets its own usage information, but when it's not, each argument is pushed incrementally, without precomputed stack slots. Since we only mentioned such precomputed stack slots in CALL_INSN_FUNCTION_USAGE, non-ACCUMULATE_OUTGOING_ARGS configurations miss the stack usage data, and cselib fails to invalidate the stores. Stores in such slots are anonymous, and they often invalidate other anonymous slots, even part of the same object, but as the testcase demonstrates, we may occasionally be unlucky that consecutive calls have the stores to multi-word objects reordered by scheduling in such a way that the last store for the first call survives the call in the cselib tables, and then it is found to be redundant with the first store for the subsequent call, as in the testcase. So, if we haven't preallocated outgoing arguments for a call (which would give us preassigned stack slots), and we have used any stack space, add function call usage covering the entire stack range where arguments were stored. for gcc/ChangeLog PR rtl-optimization/122947 * calls.cc (expand_call): Add stack function usage in non-ACCUMULATE_OUTGOING_ARGS configurations. for gcc/testsuite/ChangeLog PR rtl-optimization/122947 * gcc.dg/pr122947.c: New.
Rework dump_cselib_table to not crash when cselib_preserved_hash_table is not allocated, and to remove the extraneous indirection from dump_cselib_val that made it inconvenient to call from a debugger. for gcc/ChangeLog * cselib.cc (dump_cselib_val): Split out of and rename to... (dump_cselib_val_ptr): ... this. (dump_cselib_table): Adjust. Skip cselib_preserved_hash_table when not allocated.
Volatile memory can be used as source operand for any operations. Add
-ffuse-ops-with-volatile-access to fuse operations with volatile memory
reference and update simplify_binary_operation_1 to keep PLUS for 2
volatile memory references. On x86, this optimizes
extern volatile int bar;
int
foo (int z)
{
z *= 123;
return bar + z;
}
into
foo:
imull $123, %edi, %eax
addl bar(%rip), %eax
ret
and compile
extern volatile unsigned char u8;
void
test (void)
{
u8 = u8 + u8;
u8 = u8 - u8;
}
into
test:
movzbl u8(%rip), %eax
addb %al, u8(%rip)
movzbl u8(%rip), %eax
subb u8(%rip), %al
movb %al, u8(%rip)
ret
Tested with Linux kernel 6.17.9 on Intel Core i7-1195G7.
gcc/
PR target/122343
* common.opt: Add -ffuse-ops-with-volatile-access.
* common.opt.urls: Regenerated.
* recog.cc (general_operand): Allow volatile memory reference if
-ffuse-ops-with-volatile-access is enabled.
* simplify-rtx.cc (simplify_binary_operation_1): Keep PLUS for 2
volatile memory references.
* doc/invoke.texi: Document -ffuse-ops-with-volatile-access.
gcc/testsuite/
PR target/122343
* gcc.target/i386/20040112-1.c: Add -fomit-frame-pointer and use
check-function-bodies to check for loop.
* gcc.target/i386/avx-ne-convert-1.c: Compile with
-fno-fuse-ops-with-volatile-access.
* gcc.target/i386/avx10_2-bf16-1.c: Likewise.
* gcc.target/i386/avx10_2-convert-1.c: Likewise.
* gcc.target/i386/avx10_2-satcvt-1.c: Likewise.
* gcc.target/i386/avx512bf16-vcvtneps2bf16-1.c: Likewise.
* gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c: Likewise.
* gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c: Likewise.
* gcc.target/i386/avx512bitalg-vpshufbitqmb.c: Likewise.
* gcc.target/i386/avx512bw-vpcmpb-1.c: Likewise.
* gcc.target/i386/avx512bw-vpcmpub-1.c: Likewise.
* gcc.target/i386/avx512bw-vpcmpuw-1.c: Likewise.
* gcc.target/i386/avx512bw-vpcmpw-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvtps2qq-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvtps2uqq-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvtqq2pd-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvtqq2ps-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvttps2qq-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvttps2uqq-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvtuqq2pd-1.c: Likewise.
* gcc.target/i386/avx512dq-vcvtuqq2ps-1.c: Likewise.
* gcc.target/i386/avx512dq-vextractf32x8-1.c: Likewise.
* gcc.target/i386/avx512dq-vextractf64x2-1.c: Likewise.
* gcc.target/i386/avx512dq-vextracti64x2-1.c: Likewise.
* gcc.target/i386/avx512dq-vfpclasspd-1.c: Likewise.
* gcc.target/i386/avx512dq-vfpclassps-1.c: Likewise.
* gcc.target/i386/avx512dq-vfpclasssd-1.c: Likewise.
* gcc.target/i386/avx512dq-vfpclassss-1.c: Likewise.
* gcc.target/i386/avx512dq-vpmullq-1.c: Likewise.
* gcc.target/i386/avx512dq-vpmullq-3.c: Likewise.
* gcc.target/i386/avx512f-pr100267-1.c: Likewise.
* gcc.target/i386/avx512f-vcmppd-1.c: Likewise.
* gcc.target/i386/avx512f-vcmpps-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtps2pd-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtsd2si-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtsd2si64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtsd2usi-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtsd2usi64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtsi2ss-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtss2si-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtss2si64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtss2usi-1.c: Likewise.
* gcc.target/i386/avx512f-vcvtss2usi64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttsd2si-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttsd2si64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttsd2usi-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttsd2usi64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttss2si-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttss2si64-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttss2usi-1.c: Likewise.
* gcc.target/i386/avx512f-vcvttss2usi64-1.c: Likewise.
* gcc.target/i386/avx512f-vextractf32x4-1.c: Likewise.
* gcc.target/i386/avx512f-vextractf64x4-1.c: Likewise.
* gcc.target/i386/avx512f-vextracti64x4-1.c: Likewise.
* gcc.target/i386/avx512f-vmovapd-1.c: Likewise.
* gcc.target/i386/avx512f-vmovaps-1.c: Likewise.
* gcc.target/i386/avx512f-vmovdqa64-1.c: Likewise.
* gcc.target/i386/avx512f-vpandnq-1.c: Likewise.
* gcc.target/i386/avx512f-vpbroadcastd-1.c: Likewise.
* gcc.target/i386/avx512f-vpbroadcastq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpd-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpeqq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpequq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpged-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpgeq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpgeud-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpgeuq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpled-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpleq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpleud-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpleuq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpltd-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpltq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpltud-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpltuq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpneqd-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpneqq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpnequd-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpnequq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpq-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpud-1.c: Likewise.
* gcc.target/i386/avx512f-vpcmpuq-1.c: Likewise.
* gcc.target/i386/avx512f-vrndscalepd-1.c: Likewise.
* gcc.target/i386/avx512f-vrndscaleps-1.c: Likewise.
* gcc.target/i386/avx512fp16-complex-fma.c: Likewise.
* gcc.target/i386/avx512fp16-vaddph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtpd2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2dq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2pd-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2psx-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2qq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2udq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2uqq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2uw-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtph2w-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtps2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtqq2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttph2dq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttph2qq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttph2udq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttph2uqq-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttph2uw-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvttph2w-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vcvtuqq2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfmulcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfpclassph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfpclasssh-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vmulph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vrcpph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vrsqrtph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vsqrtph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vaddph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtpd2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2dq-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2psx-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2qq-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2udq-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2uqq-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2uw-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtph2w-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtps2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtqq2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvttph2dq-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvttph2udq-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvttph2uw-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvttph2w-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vcvtuqq2ph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfpclassph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vmulph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vrcpph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vrsqrtph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vsqrtph-1a.c: Likewise.
* gcc.target/i386/avx512vl-pr100267-1.c: Likewise.
* gcc.target/i386/avx512vl-vcmppd-1.c: Likewise.
* gcc.target/i386/avx512vl-vcmpps-1.c: Likewise.
* gcc.target/i386/avx512vl-vcvtpd2ps-1.c: Likewise.
* gcc.target/i386/avx512vl-vcvtpd2udq-1.c: Likewise.
* gcc.target/i386/avx512vl-vcvttpd2udq-1.c: Likewise.
* gcc.target/i386/avx512vl-vcvttps2udq-1.c: Likewise.
* gcc.target/i386/avx512vl-vextractf32x4-1.c: Likewise.
* gcc.target/i386/avx512vl-vmovapd-1.c: Likewise.
* gcc.target/i386/avx512vl-vmovaps-1.c: Likewise.
* gcc.target/i386/avx512vl-vmovdqa64-1.c: Likewise.
* gcc.target/i386/avx512vl-vpcmpd-1.c: Likewise.
* gcc.target/i386/avx512vl-vpcmpeqq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpcmpequq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpcmpq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpcmpud-1.c: Likewise.
* gcc.target/i386/avx512vl-vpcmpuq-1.c: Likewise.
* gcc.target/i386/pr122343-1a.c: New test.
* gcc.target/i386/pr122343-1b.c: Likewise.
* gcc.target/i386/pr122343-2a.c: Likewise.
* gcc.target/i386/pr122343-2b.c: Likewise.
* gcc.target/i386/pr122343-3.c: Likewise.
* gcc.target/i386/pr122343-4a.c: Likewise.
* gcc.target/i386/pr122343-4b.c: Likewise.
* gcc.target/i386/pr122343-5a.c: Likewise.
* gcc.target/i386/pr122343-5b.c: Likewise.
* gcc.target/i386/pr122343-6a.c: Likewise.
* gcc.target/i386/pr122343-6b.c: Likewise.
* gcc.target/i386/pr122343-7.c: Likewise.
Signed-off-by: H.J. Lu <[email protected]>
Back in r78875 mrs added cpp_get_path/dir accessors for _cpp_file in order to interface with the darwin framework system. But now I notice that the latter duplicates the better-named _cpp_get_file_dir, and I'm inclined to rename the former to match. Perhaps we should drop the initial underscore since these are no longer internal interfaces; OTOH, _cpp_hashnode_value and _cpp_backup_tokens still have the initial underscore in cpplib.h. libcpp/ChangeLog: * include/cpplib.h (cpp_get_path, cpp_get_dir): Remove. (_cpp_get_file_path, _cpp_get_file_name, _cpp_get_file_stat) (_cpp_get_file_dir): Move prototypes from... * internal.h: ...here. * files.cc (_cpp_get_file_path): Rename from... (cpp_get_path): ...this. (cpp_get_dir): Remove. gcc/ChangeLog: * config/darwin-c.cc (find_subframework_header): Use _cpp_get_file_*.
gcc/analyzer/ChangeLog: * kf.cc (register_known_functions): Remove duplicate calls to register_atomic_builtins and register_varargs_builtins. Signed-off-by: David Malcolm <[email protected]>
This was reported as a regression in GCC 14: the compiler resolves
Accum_Type to Positive for a reduction expression whose "expected
subtype" is Positive, which means that 0 cannot be used as initial
value in the expression:
Sum : Positive := V'Reduce ("+", 0);
without always raising Constraint_Error as run time. That's not the
intent according to T. Taft in
https://forum.ada-lang.io/t/regression-in-gnat-14/890
so this changes the resolution to use the base type (Integer) instead.
gcc/ada/
PR ada/115349
* sem_attr.adb (Resolve_Attribute) <Attribute_Reduce>: Use the base
type as Accum_Type if the reducer is an operator from Standard and
the type is numeric. Use the type of the first operand for other
operators. Streamline the error message given for limited types.
gcc/testsuite/
* gnat.dg/reduce3.adb: New test.
Don't allow 2 volatile memory references in *<avx512>_cmp<mode>3_dup_op so that gcc.target/i386/avx2-vpcmpeqq-1.c will generate 2 loads when -march=cascadelake is used. PR target/122343 * config/i386/sse.md (*<avx512>_cmp<mode>3_dup_op): Don't allow 2 volatile memory references. Signed-off-by: H.J. Lu <[email protected]>
When -march=cascadelake is added, we generate vmovdqa x(%rip), %ymm0 vpcmpq $1, x(%rip), %ymm0, %k0 vpmovm2q %k0, %ymm0 vmovdqa %ymm0, x(%rip) instead of vmovdqa x(%rip), %ymm1 vmovdqa x(%rip), %ymm0 vpcmpgtq %ymm1, %ymm0, %ymm0 vmovdqa %ymm0, x(%rip) Compile avx2-vpcmpgtq-1.c with -fno-fuse-ops-with-volatile-access to generate vpcmpgtq instead of vpcmpq. PR target/122343 * gcc.target/i386/avx2-vpcmpgtq-1.c: Compile with -fno-fuse-ops-with-volatile-access. Signed-off-by: H.J. Lu <[email protected]>
…d [PR122868] As Richi suggested this moves the check into the loop so we check every load. I had initially not done this because I figured the loads would be treated as a group anyway and the group would be valid or not as a whole. But for invariants they could be a group, but not all the loads within range of a known bounds. gcc/ChangeLog: PR tree-optimization/122868 * tree-vect-stmts.cc (vectorizable_load): Move check for invariant loads down into the loop.
The Adv. SIMD boolean reduction patterns were accidentally overriding one of the input arguments. This fixes it and removes unneeded intermediate moves around the subreg type castings. gcc/ChangeLog: PR target/123026 * config/aarch64/aarch64-simd.md (reduc_sbool_ior_scal_<mode>, reduc_sbool_and_scal_<mode>): Fix tmp operands[1] override. gcc/testsuite/ChangeLog: PR target/123026 * gcc.target/aarch64/pr123026.c: New test.
When we have a speculated edge but we folded the call to __builtin_unreachable () then trying to update the cgraph ICEs in resolve_speculation because there's no symtab node for __builtin_unreachable (). Reject this resolving attempt similar as to when the callees decl were NULL or it were not semantically equivalent. I only have a LTRANS unit as testcase. PR ipa/122456 * cgraph.cc (cgraph_edge::resolve_speculation): Handle a NULL symtab_node::get (callee_decl).
…ling gcc/Changelog * haifa-sched.cc (choose_ready): Don't require dfa_lookahead <= 0 to schedule SCHED_GROUP_P insns first.
This patch enables dispatch scheduling for the NVIDIA Olympus core. The dispatch constraints are based on the Olympus CPU Core Software Optimization Guide (https://docs.nvidia.com/olympus-cpu-core-software-optimization-guide-dp12531-001v0-7.pdf). The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for trunk? Signed-off-by: Jennifer Schmitz <[email protected]> gcc/ * config/aarch64/aarch64.md: Include olympus.md. * config/aarch64/olympus.md: New file. * config/aarch64/tuning_models/olympus.h: Add dispatch constraints and enable dispatch scheduling.
Add a new target instruction. Hardware-assisted sanitizers on architectures providing instructions to tag/untag memory can then make use of this new instruction pattern. For example, the memtag-stack sanitizer uses these instructions to tag and untag a memory granule. gcc/ * target-insns.def (tag_memory): New target instruction. * doc/md.texi (tag_memory): Add documentation. Signed-off-by: Claudiu Zissulescu <[email protected]>
Add a new target instruction used by hardware-assisted sanitizers on architectures providing memory-tagging instructions. This instruction is used to compute assign tags at a fixed offset from a tagged address base. For example, in AArch64 case, this pattern instantiate `addg` instruction. gcc/ * target-insns.def (compose_tag): New target instruction. * doc/md.texi (compose_tag): Add documentation. Signed-off-by: Claudiu Zissulescu <[email protected]>
Add new command line option -fsanitize=memtag-stack with the following
new params:
--param memtag-instrument-alloca [0,1] (default 1) to use MTE insns
for enabling dynamic checking of stack allocas.
Along with the new SANITIZE_MEMTAG_STACK, define a SANITIZE_MEMTAG
which will be set if any kind of memtag sanitizer is in effect (e.g.,
later we may add -fsanitize=memtag-globals). Add errors to convey
that memtag sanitizer does not work with hwaddress and address
sanitizers. Also error out if memtag ISA extension is not enabled.
MEMTAG sanitizer will use the HWASAN machinery, but with a few
differences:
- The tags are always generated at runtime by the hardware, so
-fsanitize=memtag-stack enforces a --param hwasan-random-frame-tag=1
Add documentation in gcc/doc/invoke.texi.
gcc/
* builtins.def: Adjust the macro to include the new
SANTIZIE_MEMTAG_STACK.
* flag-types.h (enum sanitize_code): Add new enumerator for
SANITIZE_MEMTAG and SANITIZE_MEMTAG_STACK.
* opts.cc (finish_options): memtag-stack sanitizer conflicts with
hwaddress and address sanitizers.
(sanitizer_opts): Add new memtag-stack sanitizer.
(parse_sanitizer_options): memtag-stack sanitizer cannot recover.
* params.opt: Add new params for memtag-stack sanitizer.
* doc/invoke.texi: Update documentation.
Signed-off-by: Claudiu Zissulescu <[email protected]>
Co-authored-by: Claudiu Zissulescu <[email protected]>
Memory tagging is used for detecting memory safety bugs. On AArch64, the
memory tagging extension (MTE) helps in reducing the overheads of memory
tagging:
- CPU: MTE instructions for efficiently tagging and untagging memory.
- Memory: New memory type, Normal Tagged Memory, added to the Arm
Architecture.
The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
HWASAN. MEMTAG and HWASAN are both hardware-assisted solutions, and
rely on the same sanitizer machinery in parts. So, define new
constructs that allow MEMTAG and HWASAN to share the infrastructure:
- hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
SANITIZE_HWASAN is true.
- hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
stack variables are to be sanitized.
MEMTAG and HWASAN do have differences, however, and hence, the need to
conditionalize using memtag_sanitize_p () in the relevant places. E.g.,
- Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
memory. Similar approach can be seen for handling
handle_builtin_alloca, where instead of doing the gimple
transformations, target hooks are used.
- Add a new internal function HWASAN_ALLOCA_POISON to handle
dynamically allocated stack when MEMTAG sanitizer is enabled. At
expansion, this allows to, in turn, invoke target-hooks to increment
tag, and use the generated tag to finally tag the dynamically allocated
memory.
The usual pattern:
irg x0, x0, x0
subg x0, x0, #16, #0
creates a tag in x0 and so on. For alloca, we need to apply the
generated tag to the new sp. In absense of an extract tag insn, the
implemenation in expand_HWASAN_ALLOCA_POISON resorts to invoking irg
again.
gcc/
* asan.cc (handle_builtin_stack_restore): Accommodate MEMTAG
sanitizer.
(handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
(get_mem_refs_of_builtin_call): Include MEMTAG along with
HWASAN.
(memtag_sanitize_stack_p): New definition.
(memtag_sanitize_allocas_p): Likewise.
(memtag_memintrin): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(report_error_func): Include MEMTAG along with HWASAN.
(build_check_stmt): Likewise.
(instrument_derefs): MEMTAG too does not deal with globals yet.
(instrument_builtin_call): Include MEMTAG along with HWASAN.
(maybe_instrument_call): Likewise.
(asan_expand_mark_ifn): Likewise.
(asan_expand_check_ifn): Likewise.
(asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
(asan_instrument): Include MEMTAG along with HWASAN.
(hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
(hwasan_emit_untag_frame): Likewise.
* asan.h (memtag_sanitize_stack_p): New declaration.
(memtag_sanitize_allocas_p): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(asan_sanitize_use_after_scope): Include MEMTAG along with
HWASAN.
* cfgexpand.cc (align_local_variable): Likewise.
(expand_one_stack_var_at): Likewise.
(expand_stack_vars): Likewise.
(expand_one_stack_var_1): Likewise.
(init_vars_expansion): Likewise.
(expand_used_vars): Likewise.
(pass_expand::execute): Likewise.
* gimplify.cc (asan_poison_variable): Likewise.
* internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition.
(expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG
sanitizer.
(expand_HWASAN_MARK): Likewise.
* internal-fn.def (HWASAN_ALLOCA_POISON): Define new.
* params.opt: Document new param.
* sanopt.cc (pass_sanopt::execute): Include MEMTAG along with
HWASAN.
* gcc.cc (sanitize_spec_function): Add check for memtag-stack.
* doc/tm.texi: Regenerate.
* target.def (extract_tag): Update documentation.
(add_tag): Likewise.
(insert_random_tag): Likewise.
Co-authored-by: Indu Bhagat <[email protected]>
Signed-off-by: Claudiu Zissulescu <[email protected]>
MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.
Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect. Continue to use the default target hook if HWASAN is
being used. Following target hooks are implemented:
- TARGET_MEMTAG_INSERT_RANDOM_TAG
- TARGET_MEMTAG_ADD_TAG
- TARGET_MEMTAG_EXTRACT_TAG
Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
- TARGET_MEMTAG_TAG_BITSIZE
- TARGET_MEMTAG_GRANULE_SIZE
The next instructions were (re-)defined:
- addg/subg (used by TARGET_MEMTAG_ADD_TAG and
TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
- stg/st2g Used to tag/untag a memory granule.
- tag_memory A target specific instruction, it will will emit MTE
instructions to tag/untag memory of a given size.
- compose_tag A target specific instruction that computes a tagged
address as an offset from a base (tagged) address.
- gmi Used for randomizing the inserting tag.
- irg Likewise.
gcc/
* config/aarch64/aarch64.md (addg): Update pattern to use
addg/subg instructions.
(stg): Update pattern.
(st2g): New pattern.
(tag_memory): Likewise.
(compose_tag): Likewise.
(irq): Update pattern to accept xzr register.
(gmi): Likewise.
(UNSPECV_TAG_SPACE): Define.
* config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
Define.
(AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
(aarch64_override_options_internal): Error out if MTE instructions
are not available.
(aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
(aarch64_can_tag_addresses): Add MEMTAG specific handling.
(aarch64_memtag_tag_bitsize): New function
(aarch64_memtag_granule_size): Likewise.
(aarch64_memtag_insert_random_tag): Likwise.
(aarch64_memtag_add_tag): Likewise.
(aarch64_memtag_extract_tag): Likewise.
(aarch64_granule16_memory_address_p): Likewise.
(aarch64_emit_stxg_insn): Likewise.
(aarch64_memtag_tag_memory_via_loop): New definition.
(aarch64_expand_tag_memory): Likewise.
(aarch64_check_memtag_ops): Likewise.
(TARGET_MEMTAG_TAG_BITSIZE): Likewise.
(TARGET_MEMTAG_GRANULE_SIZE): Likewise.
(TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
(TARGET_MEMTAG_ADD_TAG): Likewise.
(TARGET_MEMTAG_EXTRACT_TAG): Likewise.
* config/aarch64/aarch64-builtins.cc
(aarch64_expand_builtin_memtag): Update set tag builtin logic.
* config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
specific options to the linker.
* config/aarch64/aarch64-protos.h
(aarch64_granule16_memory_address_p): New prototype.
(aarch64_check_memtag_ops): Likewise.
(aarch64_expand_tag_memory): Likewise.
* config/aarch64/constraints.md (Umg): New memory constraint.
(Uag): New constraint.
(Ung): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset):
Refactor it.
(aarch64_granule16_imm6): Rename from aarch64_granule16_uimm6 and
refactor it.
(aarch64_granule16_memory_operand): New constraint.
* config/aarch64/iterators.md (MTE_PP): New code iterator to be
used for mte instructions.
(stg_ops): New code attributes.
(st2g_ops): Likewise.
(mte_name): Likewise.
* config/aarch64/aarch64.opt (aarch64-tag-memory-loop-threshold):
New parameter.
* doc/invoke.texi: Update documentation.
gcc/testsuite:
* gcc.target/aarch64/acle/memtag_1.c: Update test.
Co-authored-by: Indu Bhagat <[email protected]>
Signed-off-by: Claudiu Zissulescu <[email protected]>
Add basic tests for memtag-stack sanitizer. Memtag stack sanitizer uses target hooks to emit AArch64 specific MTE instructions. gcc/testsuite: * gcc.target/aarch64/memtag/alloca-1.c: New test. * gcc.target/aarch64/memtag/alloca-2.c: New test. * gcc.target/aarch64/memtag/alloca-3.c: New test. * gcc.target/aarch64/memtag/arguments-1.c: New test. * gcc.target/aarch64/memtag/arguments-2.c: New test. * gcc.target/aarch64/memtag/arguments-3.c: New test. * gcc.target/aarch64/memtag/arguments-4.c: New test. * gcc.target/aarch64/memtag/arguments.c: New test. * gcc.target/aarch64/memtag/basic-1.c: New test. * gcc.target/aarch64/memtag/basic-3.c: New test. * gcc.target/aarch64/memtag/basic-struct.c: New test. * gcc.target/aarch64/memtag/large-array.c: New test. * gcc.target/aarch64/memtag/local-no-escape.c: New test. * gcc.target/aarch64/memtag/memtag.exp: New file. * gcc.target/aarch64/memtag/no-sanitize-attribute.c: New test. * gcc.target/aarch64/memtag/value-init.c: New test. * gcc.target/aarch64/memtag/vararray-gimple.c: New test. * gcc.target/aarch64/memtag/vararray.c: New test. * gcc.target/aarch64/memtag/zero-init.c: New test. * gcc.target/aarch64/memtag/texec-1.c: New test. * gcc.target/aarch64/memtag/texec-2.c: New test. * gcc.target/aarch64/memtag/texec-3.c: New test. * gcc.target/aarch64/memtag/vla-1.c: New test. * gcc.target/aarch64/memtag/vla-2.c: New test. * lib/target-supports.exp (check_effective_target_aarch64_mte): New function. Co-authored-by: Indu Bhagat <[email protected]> Signed-off-by: Claudiu Zissulescu <[email protected]>
| emit_insn (gen_mulsi3 (operands[4], operands[1], operands[2])); | ||
| emit_insn (gen_addsi3 (operands[0], operands[3], operands[4])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is also wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting with this fix b4ce3f9 I get a regression of 0.5%
89415d4 to
b4ce3f9
Compare
…s with as
The gcc.target/i386/shift-gf2p8affine-2.c test FAILs on Solaris with the
native assembler:
FAIL: gcc.target/i386/shift-gf2p8affine-2.c (test for excess errors)
UNRESOLVED: gcc.target/i386/shift-gf2p8affine-2.c compilation failed to produce executable
Excess errors:
Assembler: shift-gf2p8affine-2.c
"/var/tmp//ccZMQ1Ad.s", line 30 : Illegal mnemonic
Near line: " vgf2p8affineqb $0, %zmm1, %zmm0, %zmm0"
"/var/tmp//ccZMQ1Ad.s", line 30 : Syntax error
Thus this patch only runs the test when gas is in use.
Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.
2025-12-15 Rainer Orth <[email protected]>
gcc/testsuite:
* gcc.target/i386/shift-gf2p8affine-2.c: Skip on Solaris
without gas.
The following works around SRA not being able to decompose an aggregate copy of std::complex because with x87 math ld/st pairs are not bit-preserving by adding -msse -mfpmath=sse. This avoids spurious failures of the testcase. PR testsuite/123137 * g++.dg/vect/pr64410.cc: Add -mfpmath=sse -msse on x86.
As a result of the automatic replacement by commit 4dd1398, there are several code fragments that receive the return value of end_sequence() and immediately use it as the return value of the function itself. rtx_insn *insn; ... insn = end_sequence (); return insn; It is clear that in such cases, it would be more natural to pass the return value of end_sequence() directly to the return statement without passing it through a variable. Applying this patch naturally does not change any functionality. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_expand_block_set_libcall, xtensa_expand_block_set_unrolled_loop, xtensa_expand_block_set_small_loop, xtensa_call_tls_desc): Change the return statement to pass the return value of end_sequence() directly without going through a variable, and remove the definition of that variable.
In the expansion of cstoresi4 insn patterns, LT[U] comparisons where the
second operand is an integer constant are canonicalized to LE[U] ones with
one less than the original.
/* example */
int test0(int a) {
return a < 100;
}
unsigned int test1(unsigned int a) {
return a <= 100u;
}
void test2(int a[], int b) {
int i;
for (i = 0; i < 16; ++i)
a[i] = (a[i] <= b);
}
;; before (TARGET_SALT)
test0:
entry sp, 32
movi a8, 0x63
salt a2, a8, a2
addi.n a2, a2, -1 ;; unwanted inverting
neg a2, a2 ;;
retw.n
test1:
entry sp, 32
movi a8, 0x64
saltu a2, a8, a2
addi.n a2, a2, -1 ;; unwanted inverting
neg a2, a2 ;;
retw.n
test2:
entry sp, 32
movi.n a9, 0x10
loop a9, .L5_LEND
.L5:
l32i.n a8, a2, 0
salt a8, a3, a8
addi.n a8, a8, -1 ;; immediate cannot be hoisted out
neg a8, a8
s32i.n a8, a2, 0
addi.n a2, a2, 4
.L5_LEND:
retw.n
This patch reverts such canonicalization by adding 1 to the comparison value
and then converting it back from LE[U] to LT[U], which better matches the
output machine instructions. This patch also makes it easier to benefit
from other optimizations such as CSE, constant propagation, or loop-invariant
hoisting by XORing the result with a register that has a value of 1, rather
than subtracting 1 and then negating the sign to invert the truth of the
result.
;; after (TARGET_SALT)
test0:
entry sp, 32
movi a8, 0x64
salt a2, a2, a8
retw.n
test1:
entry sp, 32
movi a8, 0x65
saltu a2, a2, a8
retw.n
test2:
entry sp, 32
movi.n a10, 1 ;; hoisted out
movi.n a9, 0x10
loop a9, .L5_LEND
.L5:
l32i.n a8, a2, 0
salt a8, a3, a8
xor a8, a8, a10
s32i.n a8, a2, 0
addi.n a2, a2, 4
.L5_LEND:
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_expand_scc_SALT):
New sub-function that emits the SALT/SALTU instructions.
(xtensa_expand_scc): Change the part related to the SALT/SALTU
instructions to a call to the above sub-function.
Signed-off-by: Mohammad-Reza Nabipoor <[email protected]> gcc/algol68/ChangeLog * a68-imports.cc (dump_encoded_mode): Replace "basic" with "string".
This commit introduces two new utility functions that replace some ad-hoc infrastructure in the scanner. Signed-off-by: Jose E. Marchesi <[email protected]> gcc/algol68/ChangeLog * a68.h: Prototypes for a68_get_file_size and a68_file_read. * a68-parser-scanner.cc (a68_file_size): New function. (a68_file_read): Renamed from io_read. (get_source_size): Deleted function. (include_files): Use a68_file_size and a68_file_read.
This commit adds support for two new command-line options for the Algol 68 front-end: -fmodules-map=<string> -fmodules-map-file=<filename> These options are used in order to specify a mapping from module indicants to file basenames. The compiler will base its search for the modules on these basenames rather on the default schema of deriving the basename from the module indicant. Signed-off-by: Jose E. Marchesi <[email protected]> gcc/algol68/ChangeLog * lang.opt (-fmodules-map): New option. (-fmodules-map-file): Likewise. * a68.h: Add prototype for a68_process_module_map. * a68-imports.cc (SKIP_WHITESPACES): Define. (PARSE_BASENAME): Likewise. (PARSE_INDICANT): Likewise. (a68_process_module_map): New function. * a68-lang.cc: (a68_init): Move initialization of A68_MODULE_FILES from there... (a68_init_options): to here. (a68_handle_option): Handle OPT_fmodules_map and OPT_fmodules_map_. * a68-parser-pragmat.cc (handle_access_in_pragmat): Normalize module indicants to upper case. * ga68.texi (Module search options): New section.
Signed-off-by: Jose E. Marchesi <[email protected]> gcc/ChangeLog * common.opt.urls: Regenerate. gcc/algol68/ChangeLog * lang.opt.urls: Regenerate.
10f17dd to
a33ab39
Compare
This patch introduces the pipeline description for the Synopsys RMX-100
series processor to the RISC-V GCC backend. The RMX-100 has a short,
three-stage, in-order execution pipeline with configurable multiply
unit options.
The option -mmpy-option was added to control which version of the MPY
unit the core has and what the latency of multiply instructions should
be similar to ARCv2 cores (see gcc/config/arc/arc.opt:60).
gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): Add
arc-v-rmx-100-series.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add arcv_rmx100.
(enum arcv_mpy_option_enum): New enum for ARC-V multiply
options.
* config/riscv/riscv-protos.h (arcv_mpy_1c_bypass_p): New
declaration.
(arcv_mpy_2c_bypass_p): New declaration.
(arcv_mpy_10c_bypass_p): New declaration.
* config/riscv/riscv.cc (arcv_mpy_1c_bypass_p): New function.
(arcv_mpy_2c_bypass_p): New function.
(arcv_mpy_10c_bypass_p): New function.
* config/riscv/riscv.md: Add arcv_rmx100.
* config/riscv/riscv.opt: New option for RMX-100 multiply unit
configuration
* doc/riscv-mtune.texi: Document arc-v-rmx-100-series.
* config/riscv/arcv-rmx100.md: New file.
Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
This patch introduces the pipeline description for the Synopsys
RHX-100 series processor to the RISC-V GCC backend. The RHX-100
features a 10-stage, dual-issue, in-order execution pipeline
architecture.
It has support for instruction fusion, which will be addressed by
subsequent patches.
gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): Add
arc-v-rhx-100-series.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add arcv_rhx100.
* config/riscv/riscv.cc (enum riscv_fusion_pairs): Add
RISCV_FUSE_ARCV.
* config/riscv/riscv.md: Add arcv_rhx100 to tune attribute.
* doc/riscv-mtune.texi: Add RHX-100 documentation.
* config/riscv/arcv-rhx100.md: New file.
Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
This patch implements instruction fusion support for the Synopsys
RHX-100 processor by adding the arcv_macro_fusion_pair_p function
and supporting infrastructure.
The implementation supports fusion of several instruction patterns:
multiply-add sequences, shift-based bit extraction, load-immediate with
conditional branches, adjacent memory operations, memory operations with
arithmetic instructions, memory operations with LUI instructions, and
load-immediate with store operations.
A new arcv.cc file is added to contain ARC-V specific optimizations,
and the existing multiply bypass functions are moved from riscv.cc to
this new file for better organization.
gcc/ChangeLog:
* config.gcc: Add arcv.o to extra_objs.
* config/riscv/riscv-protos.h (arcv_macro_fusion_pair_p): New
declaration.
(arcv_sched_fusion_priority): New declaration.
(arcv_can_issue_more_p): New declaration.
(arcv_sched_variable_issue): New declaration.
(arcv_sched_init): New declaration.
(arcv_sched_reorder2): New declaration.
(arcv_sched_adjust_priority): New declaration.
(arcv_sched_adjust_cost): New declaration.
* config/riscv/riscv.cc (arcv_mpy_1c_bypass_p): Move to arcv.cc
(arcv_mpy_2c_bypass_p): Move to arcv.cc
(arcv_mpy_10c_bypass_p): Move to arcv.cc
(riscv_macro_fusion_pair_p): New function.
* config/riscv/t-riscv: Add arcv.o build rule.
* config/riscv/arcv.cc: New file.
Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
…eries.
This patch implements the TARGET_SCHED_FUSION_PRIORITY hook for the
Synopsys RHX-100 processor to improve instruction scheduling by
prioritizing fusible memory operations.
The implementation analyzes load and store instructions to extract
base registers and offsets, then assigns scheduling priorities based
on several factors: access width (wider accesses get higher priority),
base register number, and memory offset values. Instructions with
adjacent addresses are grouped together to enable better fusion
opportunities.
gcc/ChangeLog:
* config/riscv/arcv.cc (arcv_fusion_load_store): New function.
(arcv_sched_fusion_priority): New function.
* config/riscv/riscv.cc (riscv_sched_fusion_priority): New
function.
(TARGET_SCHED_FUSION_PRIORITY): Define hook.
Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
This patch implements instruction scheduling support for the
dual-issue Synopsys RHX-100 processor by adding scheduler hooks and
state tracking for the two execution pipes.
The implementation tracks ALU pipe and memory pipe usage to maximize
dual-issue opportunities. It includes reordering logic to promote
fusion of adjacent memory operations and other instruction pairs that
can execute simultaneously on the RHX-100's dual-issue architecture.
The scheduler prioritizes fused instruction pairs and adjusts costs
to improve scheduling decisions. Memory operations are directed to
the appropriate pipe while arithmetic operations utilize the ALU pipe,
enabling optimal utilization of both execution units.
New TARGET_SCHED hooks are implemented including ADJUST_PRIORITY,
REORDER2, and enhanced VARIABLE_ISSUE handling specifically for
the RHX-100 microarchitecture.
gcc/ChangeLog:
* config/riscv/arcv.cc (struct arcv_sched_state): New struct.
(arcv_sched_init): New function.
(arcv_sched_reorder2): New function.
(arcv_sched_adjust_priority): New function.
(arcv_sched_adjust_cost): New function.
(arcv_can_issue_more_p): New function.
(arcv_sched_variable_issue): New function.
* config/riscv/riscv.cc (riscv_fusion_enabled_p): Add forward
declaration.
(riscv_sched_init): Add call to arcv_shed_init.
(riscv_sched_variable_issue): Add ARC-V-specific handling.
(riscv_sched_adjust_cost): Add ARC-V-specific cost adjustment
and fix parameter names.
(riscv_sched_adjust_priority): New function.
(riscv_sched_reorder2): New function.
(TARGET_SCHED_ADJUST_PRIORITY): Define hook.
(TARGET_SCHED_REORDER2): Define hook.
* config/riscv/riscv.h (TARGET_ARCV_RHX100): New macro.
Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Co-authored-by: Alex Turjan <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
…axt fusion.
This patch adds instruction patterns to support fusion of multiply-add
sequences and bit extraction operations for the Synopsys RHX-100 processor.
The multiply-add fusion supports both signed and unsigned 16-bit operands
expanded to 32-bit multiply-accumulate operations. The implementation
generates separate multiply and add instructions that can be fused by
the processor hardware.
The bit extraction fusion implements zero extraction using shift-left
followed by shift-right operations, which can be fused into a single
micro-operation. New instruction types "imul_fused" and "alu_fused"
are added to the scheduling model to handle these fused operations.
Test cases are included to verify the correct generation of fusible
instruction sequences for multiply-add, bit extraction, and
load-immediate with conditional branch patterns.
gcc/ChangeLog:
* config/riscv/arcv-rhx100.md (arcv_rhx100_imul_fused): New
reservation.
(arcv_rhx100_alu_fused): New reservation.
* config/riscv/iterators.md (is_zero_extract): New code
attribute.
* config/riscv/riscv.cc (riscv_rtx_costs): Add
TARGET_ARCV_RHX100 support for SIGN_EXTRACT.
* config/riscv/riscv.md: Add imul_fused and alu_fused to type
attribute.
(umaddhisi4): New expand.
(madd_split): New insn_and_split.
(madd_split_extended): New insn_and_split.
(*zero_extract_fused): New insn.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arcv-fusion-limm-condbr.c: New test.
* gcc.target/riscv/arcv-fusion-madd.c: New test.
* gcc.target/riscv/arcv-fusion-xbfu.c: New test.
Authored-by: Artemiy Volkov <[email protected]>
Co-authored-by: Michiel Derhaeg <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
a33ab39 to
1f24597
Compare
No description provided.