Merge upstream/llvm into amd-debug#3063
Open
mariusz-sikora-at-amd wants to merge 150 commits into
Open
Conversation
mariusz-sikora-at-amd
commented
Jun 25, 2026
…OR(X,C1),C0) with nonzero indices (llvm#204533) Removed equivalent fold from x86 and added generic DAG fold to replace it - net zero test changes Refactored version of llvm#200935
…llvm#204144) * Remove X86ISD::PDEP/PEXT and use ISD::PDEP/PEXT instead * AutoUpgrade x86 pdep/pext intrinsics to llvm.pdep/pext generics * Move X86 DAG knownbits/demandedbits handling to generic (unchanged) * Move X86 InstCombine folds to generic (unchanged) * Add memory sanitizer handling for generic pdep/pext intrinsics * Updated clang builtins to emit generics Fixes llvm#204537
This replaces the previously removed xnack-any-only feature, with the inversion xnack-on-off-modes. All pre-gfx12.5 xnack targets support the controllable mode. Ignore explicitly set xnack settings the same way as is done for xnack requests on other unsupported targets.
Progams that do not use any memory (e.g., no mappings) were failing because we were trying to execute zero size transfers. This commit adds handling for this case.
…lvm#204733) LLJITWithSymbolAliases shows how the symbolAliases function can be used to introduce aliases for both JIT'd and precompiled symbols.
This removes the corresponding handwritten C++ combine handling from the AArch64 prelegalizer combiners. Assisted-by: codex
I just wasted way too long trying to figure out why my newly added RUN lines were randomly broken or not. Stop using absolute line numbers.
…04739) There was a typo in the type-first syntax code example.
Follow up on hoisting replicate loads in VPlan-licm to also sink replicate stores.
The lit internal shell chains together the contents of multiple RUN: lines by connecting them with implicit && nodes, forming a binary tree structure which is then executed recursively by `_executeShCommand`. However the tree structure is constructed in a very simple way which makes it effectively just a linked list, so `_executeShCommand` must recurse to a depth equal to the number of commands. If a test file contains more than 1000 RUN: lines (e.g. running the clang driver only, with lots of different options), then this causes a RecursionError exception, which did not happen using the external shell. Failures of this kind can be avoided by instead connecting the commands together in a _balanced_ binary tree, which has equivalent behaviour, since the && shell operator is associative.
Optimize AArch64 local-exec TLS relocation handling by replacing a self-add R_AARCH64_TLSLE_ADD_TPREL_HI12 instruction with nop when the high 12 bits are zero. The optimization is disabled by --no-relax and avoids non-equivalent forms such as non-self-adds and 32-bit destination registers.
llvm#204591) VisitCastExpr dropped several borrow-carrying cast kinds into its default case. Propagate the borrow through `__builtin_bit_cast`/`std::bit_cast` of a pointer and through wrapping/unwrapping `_Atomic(T*)`, so a stack address laundered through either is caught (matching reinterpret_cast). hasOrigins and buildListForType now see through AtomicType, which is transparent for lifetimes. Assisted-by: Claude Opus 4.8 Co-authored-by: Gabor Horvath <gaborh@apple.com>
In llvm#156467, we switched to using `getMCAsmInfo()->usesWindowsCFI()` to recognize "Windows". This does not include Windows triples with ELF binary formats. So, for aarch64-pc-windows-msvc-elf we would use the Windows callee-save list in `AArch64RegisterInfo::getCalleeSavedRegs()`, but FrameLowering would handle this like Linux, and fail to invalidate the (x29, x28) pairing. This patch switches back to using AArch64Subtarget::isTargetWindows(), which aligns with getCalleeSavedRegs(). Note: We were using `usesWindowsCFI()` to include UEFI targets, however, there does not seem to be tests/support for UEFI triples on AArch64 (basic examples that compile for x86 fail: https://godbolt.org/z/dPWdTrEG7). So, this has been moved to a TODO. Fixes llvm#204060
These annotations were mistakenly set up as LLVM_ABI_FOR_TEST. Since these are public headers, they should be using LLVM_ABI. The effort to build LLVM as a dylib is tracked in llvm#109483.
…#203993) `G_PHI` on vectors wider than the SPIR-V max vector size previously failed legalization. This PR adds a `fewerElementsIf` rule that splits them down to `MaxVectorSize`, matching how other vector ops are handled in `SPIRVLegalizerInfo.cpp`. Added the following test `llvm/test/CodeGen/SPIRV/instructions/phi-large-vector.ll` covering spirv32 and spirv64.
This updates WidenVecRes_MGATHER and WidenVecOp_MSCATTER to support scalable vector types.
…ns. (llvm#204201) Builtins that only care about the size of the element type but not its format (e.g loads, stores and shuffles) do not require any special instructions to code generate beyond those already available to +neon. Fixes llvm#203159
…x) to 0 (llvm#204783) As noted on llvm#204144
…een builtins and libc"" (llvm#204728) Reverts llvm#203152
…m#204388) removeAggregateTypesFromCalls named the call to key the type-restoration metadata, which asserts for void-returning calls. Key the metadata via instruction metadata on the call instead, which works for void results.
…lvm#204524) `check_cxx_compiler_flag` stores its result in `CXX_SUPPORTS_NO_CXX98_COMPAT_EXTRA_SEMI_FLAG`, but the guarding `if()` checked `CXX_SUPPORTS_CXX98_COMPAT_EXTRA_SEMI_FLAG` (without `_NO_`), which is never set. The condition was therefore always false and the `-Wno-c++98-compat-extra-semi` suppression for `mlir_rocm_runtime` was never applied. The sibling flag checks in the same block (`-Wno-return-type-c-linkage`, `-Wno-nested-anon-types`, `-Wno-gnu-anonymous-struct`) already use matching variable names, so this aligns the typo'd guard with the established pattern. No test is included, this is a build-system-only (CMake) change to a warning-suppression guard and is not unit-testable. Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
In parametric delinearization, it collects subexpressions whose SCEV type is `SCEVUnknown` and uses them as candidates for the array dimensions. When traversing these subexpressions, it may follow any kind of expression. For example, if it follows a `sext` expression, this can lead to type inconsistencies among the collected terms. This patch fixes this issue by preventing traversal into subexpressions other than `SCEVAddExpr` or `SCEVAddRecExpr`. Note: I tried to minimize the test case, but this seems to be as far as it can go. Fix llvm#204066.
Add a pass to perform VLA shuffle optimizations for SVE. First up is using tbl to replace deinterleave4+uunpk+zext/uitofp by generating shuffle masks with index, exploiting the fact that out-of-range indices in the mask produce zeroes in the result vector. That way, we can easily zero-extend smaller elements by using the destination type when generating the mask, and having one index in range with several out-of-range for each destination element.
We used to only have a list of blocks under construction, but now we have a list of pointers, which gives us more information. Use this new list to diagnose a case we couldn't previously diagnose. The test case is from `constant-expression-cxx14.cpp` and shows that a write to a const member is invalid, even if the parent object is being constructed right now.
…atent test bugs (llvm#203876) `libcxx` tests gate `_BitInt` blocks on `TEST_HAS_EXTENSION(bit_int)`, which is not a recognized Clang extension and returns 0 in every language mode. The blocks have been compiling as dead code, hiding latent bugs across 23 files. Migrate to a `TEST_HAS_BITINT` helper backed by the standard `__BITINT_MAXWIDTH__`. The latent bugs the activation surfaces are fixed in the same commit: - overflow-safe `min`; - post-P4052R0 saturating-arithmetic renames plus a `clang-21`/`apple-clang-21` skip for `saturating.bitint.pass.cpp` (Clang 21 asserts in constexpr eval on non-byte-aligned `_BitInt`); - an `intcmp` syntax fix; - `byteswap.verify` directive tightening; - a missing `<climits>` include in `byteswap.pass` (only visible under `-fmodules`); - C++03-compatible `static_assert` form in `digits10`; gating `digits`/`digits10` `_BitInt` blocks behind `!_LIBCPP_USE_FROZEN_CXX03_HEADERS` since the fix from llvm#193002 was not backported to the frozen snapshot; and - `make_format_args` reduced to a placeholder pending a SFINAE-friendly rejection path. Discussion: https://discourse.llvm.org/t/implementing-p3666r4-bit-precise-integers-in-libc/91070 Assisted-by: Claude (Anthropic) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…tion (llvm#204196) Current implementation that uses std::optional<bool> captures cl::BOU_FALSE, for example -global-isel=0, as true. Explictly setting option to 0 should be false, forced option not set. This could be fixed but I find it cleaner to use boolOrDefault directly and use same logic as in TargetPassConfig. Options EnableIPRA and EnableGlobalISelAbort are left as optional since for them it is explicitly checked if they are set using getNumOccurrences. boolOrDefault has encoded unset option.
Adds a default COFF/x86_64 JITLink pass that synthesizes `__imp_` Import Address Table (IAT) entries for dllimport references. This allows COFF objects using dllimport to be JIT-linked without a hand-built import library or a special generator. On COFF, `__declspec(dllimport)` codegen emits indirect accesses through a named `__imp_X` symbol (`callq *__imp_bar(%rip)`; `movq __imp_g(%rip)` for data), with `__imp_X` left undefined. JITLink had no handling for this. The new pass — the COFF counterpart of the ELF/Mach-O GOT builder — defines each undefined external `__imp_X` over an 8-byte slot holding the address of `X`, and leaves `X` as an ordinary external to be resolved normally (import library, dynamic-library search generator, etc.). Both the call and data-access forms then resolve indirectly through the slot. Rather than the `GOTTableManager` pattern (anonymous entry + edge redirection), the pass defines the *named* `__imp_X` symbol over the slot. ELF GOT references are nameless edge kinds, so that builder must create an anonymous entry and redirect edges; COFF references `__imp_X` by name, so defining it is simpler — no edge rewriting, no orphaned-external cleanup, sharing is automatic, and the call/data-access forms are handled identically. x86_64 only (runs in the COFF/x86_64 backend's default pass pipeline). New lit test `COFF_dllimport_iat.s`: assembles an object referencing `__imp_bar` (call) and `__imp_foo` (data load), supplies `foo`/`bar` via `-abs`, links with `-noexec`, and uses `jitlink-check` to verify each `__imp_` slot holds the target's address and that the references resolve through the slot. Partly implements github issue: llvm#190122 In the comment section of the github issue there is this comment llvm#190122 (comment) This PR implements point 2 Synthesis IAT entries.
…vm#198554) (llvm#204978) This reverts commit 91edd87. It was causing CI failures for Linux.
…205004) Add a getNumOperandsWithoutMask helper to VPReplicateRecipe, mirroring the existing VPInstruction::getNumOperandsWithoutMask, and use it to replace some hand-rolled code.
…205008) Replace the hand-written check for a VPReplicateRecipe load/store using the value as its address with VPlan pattern matching via m_Unary/m_Binary, which also handle masked recipes uniformly.
Adds a Session::ControllerAccess implementation for in-process JIT setups, where the controller (LLVM-side) and the executor (orc-rt) live in the same address space. The two sides communicate through a refcounted C-ABI struct (Connection) of function pointers. The C-only interface avoids assuming a common C++ ABI between the two sides and supports symmetric, graceful disconnect: when either side calls Connection::Disconnect, in-flight cross-calls are drained and pending continuations are surfaced as out-of-band errors, after which further cross-calls fail cleanly. This is intended to be paired with a new ExecutorProcessControl implementation (llvm::orc::InProcessEPC) on the LLVM side, landing in a follow-up commit. Unit tests are included covering construction without connect, attach via Session, OnConnect-failure detach, successful and out-of-band-error call cases, and the disconnect-drains-pending behavior.
…197862) AtomicExpand fails for aligned `store atomic <n x T>` because it does not find a compatible library call. This change adds appropriate ptrtoint + bitcast so that the call can be lowered, mirroring the load-side handling. Store-side counterpart to llvm#148900. Stacked on top of llvm#201566.
…llvm#205020) This renames the orc_rt::detail::ScopeExitRunner class to orc_rt::scope_exit and adds a class template argument deduction guide.
…ons (llvm#193125) llvm#188400 regressed data-section folding under --icf=safe{,_thunks}: no-addrsig fallback, and over-broad compiler-emitted addrsig entries covering data symbols, both caused markSymAsAddrSig to set keepUnique on data sections, after which foldIdenticalSections refused to fold them. ld64 coalesces __cfstring, __objc_classrefs and __objc_selrefs unconditionally regardless of addrsig, so ignore keepUnique for them as a workaround for the imprecise addrsig payload.
…te type (llvm#203898) We handled this for pure vector type before but missed the aggregate types, this patch try to apply same mechanism on them where unsupported vector types are converted to same size i8 vector types.
This patch does 2 things: 1. Change matmul interface to use newly defined OFP8 RVV types. 2. change all of matmul overloaded interfaces to only keep only widen information and eliminate types information.
…e-aligned offset (llvm#204320) Fix llvm#184959.
…vm#201506) fixes llvm#201490 It would be possible to have `PrevClassTemplate == false` when `SS` was invalid. Since it is already invalid, it would be safe to skip `setMemberSpecialization` for `NewTemplate`. When the qualified scope specifier is invalid, Sema may have already diagnosed the declaration and marked it invalid. In that case there may be no previous class template declaration, so the assertion is too strong. Avoid marking the new declaration as a member specialization unless the previous class template exists.
When reading extensible binary format profiles with fixed-length MD5 name tables, the reader eagerly allocates and populates a std::vector<FunctionId> to store the name table. This eager loading is particularly wasteful when ProfileIsCS is false, as we populate the entire name table just to support lookups during profile ingestion, even though we may only use a subset of the profile. Since FunctionId is 16 bytes on 64-bit systems, a name table containing 10 million MD5 hash values would consume 160MB of heap memory. This patch implements lazy loading for the name table in extensible binary format profiles when the fixed-length MD5 layout is used. Specifically, this patch introduces SampleProfileNameTable to encapsulate the name table representation, supporting both lazy loading (pointing directly to the memory-mapped buffer) and eager loading (using a vector). Eager loading is retained as a fallback for layouts that do not support O(1) random access (such as variable-length string tables). The reader transitions between these modes using setLazy and resetToEager. The getNameTable interface is updated to return an iterator_range of SampleProfileNameTable::iterator, which reads the MD5 values directly from the buffer on-demand when lazy-loaded. - Heap consumption: Saves 16 bytes of heap memory for each name table entry by avoiding the std::vector allocation. - Compilation performance: Saves about 4% on ThinLTO pre-link and 10% on ThinLTO backend on shared and non-shared profiles.
- The semantics of asyncmarks is now defined purely in terms of sequences, without referring to the implementation. - The examples incorrectly used (post)dominance. Fixed that with wording in terms of asyncmark sequences.
…ions (llvm#200901) Using `vftintrz.lu.d` for converting scalar double/float values to unsigned 64-bit integers, and `vffint.d.lu` vice versa.
Replaces class definitions with decls for tag types that don't need a body, and moves the SPSError tag down to just above it's serialization-traits class.
…#203394) Replace a `load <N x i1>` under a sext/zext with a scalar load + bitcast, so the `combineToExtendBoolVectorInReg` helper can apply, avoiding scalarization. Optimisation for the SVE case with a predicate load to be added in a follow up. Fixes llvm#200325
Allows SPS serialization to/from ExecutorAddrRange. This will be used in upcoming patches for compact-unwind registration support.
Seems like llvm#199396 had no effect at all, even though the patch itself seems pretty obvious. Change the semantics of the command-line option to support `-fno-experimental-constant-interpreter` as well. This way, the cmake option can be used to set the default and the `-f`/`-fno-` command-line options can be used to override the default behavior.
Remove everything that has to do with named barriers and put it in a series of model extensions specific to /sbarrier/named-barriers. I had to change a few things to make it fit, in summary: Base Model: - (~) Stylistic changes that make it easier to refer to specific rules. Each rule is in a rubric instead of a bullet point. - (-) No longer defines `barrier-mutually-exclusive` - (-) No longer defines barrier `join` and any associated rule. New named barrier extensions - (+) Define "named barrier" as a sub-type of barrier objects. This makes barrier-mutually-exclusive redundant. - (+) Define barrier join as an op that can exclusively be done on `named barrier objects`. - (+) Define rules relating to join and its ordering with other barrier operations Following these changes, the target tables changed a bit as well. Motive: Barrier _join_ + `barrier-mutually-exclusive` only ever makes sense when considering named barriers in the ISA. They are an alien concept to higher-level barrier abstractions. _Join_ has especially been a pain to deal with and explain in the general, high-level execution model. Kicking it down into an extension allows to keep the base model much more concise. As the model extension is defined w.r.t. the ISA, it's an appropriate place to surface ISA-specific restrictions. For example we don't need to dance around the concept of "each thread is a member of at most one named barrier" with `barrier-mutually-exclusive`. We can just say it straight away when describing the behavior of _join_ in the model extension.
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/6/builds/216 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.