Skip to content

Merge upstream/llvm into amd-debug#3063

Open
mariusz-sikora-at-amd wants to merge 150 commits into
amd-debugfrom
amd/dev/masikora/amd-debug-merge-candidate
Open

Merge upstream/llvm into amd-debug#3063
mariusz-sikora-at-amd wants to merge 150 commits into
amd-debugfrom
amd/dev/masikora/amd-debug-merge-candidate

Conversation

@mariusz-sikora-at-amd

Copy link
Copy Markdown
diff --git a/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp b/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
remerge CONFLICT (content): Merge conflict in llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
index 4d45462b4b2e..1dddd9592d6a 100644
--- a/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
+++ b/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
@@ -747,43 +747,7 @@ static TargetCallbacks getCallbacks(ObjectFile &Obj, const Twine &Filename) {

 static bool dumpObjectFile(ObjectFile &Obj, DWARFContext &DICtx,
                            const Twine &Filename, raw_ostream &OS) {
-<<<<<<< a0d8334fa9c3 (Merge llvm/main into amd-debug)
   TargetCallbacks Callbacks = getCallbacks(Obj, Filename);
-||||||| 21622397c16f
-
-  auto MCRegInfo = createRegInfo(Obj);
-  if (!MCRegInfo)
-    logAllUnhandledErrors(createStringError(inconvertibleErrorCode(),
-                                            "Error in creating MCRegInfo"),
-                          errs(), Filename.str() + ": ");
-
-  auto GetRegName = [&MCRegInfo](uint64_t DwarfRegNum, bool IsEH) -> StringRef {
-    if (!MCRegInfo)
-      return {};
-    if (std::optional<MCRegister> LLVMRegNum =
-            MCRegInfo->getLLVMRegNum(DwarfRegNum, IsEH))
-      if (const char *RegName = MCRegInfo->getName(*LLVMRegNum))
-        return StringRef(RegName);
-    return {};
-  };
-=======
-
-  auto MCRegInfo = createRegInfo(Obj);
-  if (!MCRegInfo)
-    logAllUnhandledErrors(createStringError(inconvertibleErrorCode(),
-                                            "Error in creating MCRegInfo"),
-                          errs(), Filename + ": ");
-
-  auto GetRegName = [&MCRegInfo](uint64_t DwarfRegNum, bool IsEH) -> StringRef {
-    if (!MCRegInfo)
-      return {};
-    if (std::optional<MCRegister> LLVMRegNum =
-            MCRegInfo->getLLVMRegNum(DwarfRegNum, IsEH))
-      if (const char *RegName = MCRegInfo->getName(*LLVMRegNum))
-        return StringRef(RegName);
-    return {};
-  };
->>>>>>> 64ad10fcda69 ([AMDGPU][doc] Refactor Barrier Execution Model (#204566))

RKSimon and others added 30 commits June 19, 2026 06:24
…OR(X,C1),C0) with nonzero indices (llvm#204533)

Removed equivalent fold from x86 and added generic DAG fold to replace
it - net zero test changes

Refactored version of llvm#200935
…llvm#204144)

* Remove X86ISD::PDEP/PEXT and use ISD::PDEP/PEXT instead
* AutoUpgrade x86 pdep/pext intrinsics to llvm.pdep/pext generics
* Move X86 DAG knownbits/demandedbits handling to generic (unchanged)
* Move X86 InstCombine folds to generic (unchanged)
* Add memory sanitizer handling for generic pdep/pext intrinsics
* Updated clang builtins to emit generics

Fixes llvm#204537
This replaces the previously removed xnack-any-only feature,
with the inversion xnack-on-off-modes. All pre-gfx12.5 xnack
targets support the controllable mode. Ignore explicitly
set xnack settings the same way as is done for xnack requests
on other unsupported targets.
Progams that do not use any memory (e.g., no mappings) were failing
because we were trying to execute zero size transfers. This commit adds
handling for this case.
…lvm#204733)

LLJITWithSymbolAliases shows how the symbolAliases function can be used
to introduce aliases for both JIT'd and precompiled symbols.
This removes the corresponding handwritten C++ combine handling from the
AArch64 prelegalizer combiners.

Assisted-by: codex
I just wasted way too long trying to figure out why my newly added RUN
lines were randomly broken or not.

Stop using absolute line numbers.
…04739)

There was a typo in the type-first syntax code example.
Follow up on hoisting replicate loads in VPlan-licm to also sink
replicate stores.
The lit internal shell chains together the contents of multiple RUN:
lines by connecting them with implicit && nodes, forming a binary tree
structure which is then executed recursively by `_executeShCommand`.
However the tree structure is constructed in a very simple way which
makes it effectively just a linked list, so `_executeShCommand` must
recurse to a depth equal to the number of commands.

If a test file contains more than 1000 RUN: lines (e.g. running the
clang driver only, with lots of different options), then this causes a
RecursionError exception, which did not happen using the external shell.
Failures of this kind can be avoided by instead connecting the commands
together in a _balanced_ binary tree, which has equivalent behaviour,
since the && shell operator is associative.
Optimize AArch64 local-exec TLS relocation handling by replacing a
self-add R_AARCH64_TLSLE_ADD_TPREL_HI12 instruction with nop when the
high 12 bits are zero.

The optimization is disabled by --no-relax and avoids non-equivalent
forms such as non-self-adds and 32-bit destination registers.
llvm#204591)

VisitCastExpr dropped several borrow-carrying cast kinds into its
default case. Propagate the borrow through
`__builtin_bit_cast`/`std::bit_cast` of a pointer and through
wrapping/unwrapping `_Atomic(T*)`, so a stack address laundered through
either is caught (matching reinterpret_cast). hasOrigins and
buildListForType now see through AtomicType, which is transparent for
lifetimes.

Assisted-by: Claude Opus 4.8

Co-authored-by: Gabor Horvath <gaborh@apple.com>
In llvm#156467, we switched to using `getMCAsmInfo()->usesWindowsCFI()` to
recognize "Windows". This does not include Windows triples with ELF
binary formats.

So, for aarch64-pc-windows-msvc-elf we would use the Windows callee-save
list in `AArch64RegisterInfo::getCalleeSavedRegs()`, but FrameLowering
would handle this like Linux, and fail to invalidate the (x29, x28)
pairing.

This patch switches back to using AArch64Subtarget::isTargetWindows(),
which aligns with getCalleeSavedRegs().

Note: We were using `usesWindowsCFI()` to include UEFI targets, however,
there does not seem to be tests/support for UEFI triples on AArch64
(basic examples that compile for x86 fail: https://godbolt.org/z/dPWdTrEG7). 
So, this has been moved to a TODO.

Fixes llvm#204060
These annotations were mistakenly set up as LLVM_ABI_FOR_TEST. Since
these are public headers, they should be using LLVM_ABI.

The effort to build LLVM as a dylib is tracked in llvm#109483.
…#203993)

`G_PHI` on vectors wider than the SPIR-V max vector size previously
failed legalization. This PR adds a `fewerElementsIf` rule that splits
them down to `MaxVectorSize`, matching how other vector ops are handled
in `SPIRVLegalizerInfo.cpp`.


Added the following test
`llvm/test/CodeGen/SPIRV/instructions/phi-large-vector.ll` covering
spirv32 and spirv64.
This updates WidenVecRes_MGATHER and WidenVecOp_MSCATTER to support
scalable vector types.
…ns. (llvm#204201)

Builtins that only care about the size of the element type but not its
format (e.g loads, stores and shuffles) do not require any special
instructions to code generate beyond those already available to +neon.

Fixes llvm#203159
…m#204388)

removeAggregateTypesFromCalls named the call to key the type-restoration
metadata, which asserts for void-returning calls. Key the metadata via
instruction metadata on the call instead, which works for void results.
…lvm#204524)

`check_cxx_compiler_flag` stores its result in
`CXX_SUPPORTS_NO_CXX98_COMPAT_EXTRA_SEMI_FLAG`, but the guarding `if()`
checked `CXX_SUPPORTS_CXX98_COMPAT_EXTRA_SEMI_FLAG` (without `_NO_`),
which is never set. The condition was therefore always false and the
`-Wno-c++98-compat-extra-semi` suppression for `mlir_rocm_runtime` was
never applied.

The sibling flag checks in the same block (`-Wno-return-type-c-linkage`,
`-Wno-nested-anon-types`, `-Wno-gnu-anonymous-struct`) already use
matching variable names, so this aligns the typo'd guard with the
established pattern.

No test is included, this is a build-system-only (CMake) change to a
warning-suppression guard and is not unit-testable.

Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
In parametric delinearization, it collects subexpressions whose SCEV
type is `SCEVUnknown` and uses them as candidates for the array
dimensions. When traversing these subexpressions, it may follow any kind
of expression. For example, if it follows a `sext` expression, this can
lead to type inconsistencies among the collected terms.
This patch fixes this issue by preventing traversal into subexpressions
other than `SCEVAddExpr` or `SCEVAddRecExpr`.

Note: I tried to minimize the test case, but this seems to be as far as
it can go.

Fix llvm#204066.
Add a pass to perform VLA shuffle optimizations for SVE.

First up is using tbl to replace deinterleave4+uunpk+zext/uitofp
by generating shuffle masks with index, exploiting the fact that
out-of-range indices in the mask produce zeroes in the result
vector. That way, we can easily zero-extend smaller elements
by using the destination type when generating the mask, and
having one index in range with several out-of-range for each
destination element.
We used to only have a list of blocks under construction, but now we
have a list of pointers, which gives us more information.

Use this new list to diagnose a case we couldn't previously diagnose.
The test case is from `constant-expression-cxx14.cpp` and shows that a
write to a const member is invalid, even if the parent object is being
constructed right now.
…atent test bugs (llvm#203876)

`libcxx` tests gate `_BitInt` blocks on `TEST_HAS_EXTENSION(bit_int)`,
which is not a recognized Clang extension and returns 0 in every
language mode. The blocks have been compiling as dead code, hiding
latent bugs across 23 files.

Migrate to a `TEST_HAS_BITINT` helper backed by the standard
`__BITINT_MAXWIDTH__`. The latent bugs the activation surfaces are fixed
in the same commit:
- overflow-safe `min`;
- post-P4052R0 saturating-arithmetic renames plus a
`clang-21`/`apple-clang-21` skip for `saturating.bitint.pass.cpp` (Clang
21 asserts in constexpr eval on non-byte-aligned `_BitInt`);
- an `intcmp` syntax fix;
- `byteswap.verify` directive tightening;
- a missing `<climits>` include in `byteswap.pass` (only visible under
`-fmodules`);
- C++03-compatible `static_assert` form in `digits10`; gating
`digits`/`digits10` `_BitInt` blocks behind
`!_LIBCPP_USE_FROZEN_CXX03_HEADERS` since the fix from llvm#193002 was not
backported to the frozen snapshot; and
- `make_format_args` reduced to a placeholder pending a SFINAE-friendly
rejection path.

Discussion:
https://discourse.llvm.org/t/implementing-p3666r4-bit-precise-integers-in-libc/91070

Assisted-by: Claude (Anthropic)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…tion (llvm#204196)

Current implementation that uses std::optional<bool> captures cl::BOU_FALSE,
for example -global-isel=0, as true. Explictly setting option to 0 should be
false, forced option not set.
This could be fixed but I find it cleaner to use boolOrDefault directly and
use same logic as in TargetPassConfig.
Options EnableIPRA and EnableGlobalISelAbort are left as optional since for
them it is explicitly checked if they are set using getNumOccurrences.
boolOrDefault has encoded unset option.
Adds a default COFF/x86_64 JITLink pass that synthesizes `__imp_` Import
Address Table (IAT) entries for dllimport references. This allows COFF
objects using dllimport to be JIT-linked without a hand-built import library or
a special generator.

On COFF, `__declspec(dllimport)` codegen emits indirect accesses through a named
`__imp_X` symbol (`callq *__imp_bar(%rip)`; `movq __imp_g(%rip)` for data),                                                                                                                                                                                                                                                  
with `__imp_X` left undefined. JITLink had no handling for this. The new pass —
the COFF counterpart of the ELF/Mach-O GOT builder — defines each undefined
external `__imp_X` over an 8-byte slot holding the address of `X`, and leaves `X`
as an ordinary external to be resolved normally (import library, dynamic-library
search generator, etc.). Both the call and data-access forms then resolve
indirectly through the slot.

Rather than the `GOTTableManager` pattern (anonymous entry + edge redirection),
the pass defines the *named* `__imp_X` symbol over the slot. ELF GOT references
are nameless edge kinds, so that builder must create an anonymous entry and
redirect edges; COFF references `__imp_X` by name, so defining it is simpler —                                                                                                                                                                                                                                               
no edge rewriting, no orphaned-external cleanup, sharing is automatic,
and the call/data-access forms are handled identically.

x86_64 only (runs in the COFF/x86_64 backend's default pass pipeline). New lit
test `COFF_dllimport_iat.s`: assembles an object referencing `__imp_bar` (call)
 and `__imp_foo` (data load), supplies `foo`/`bar` via `-abs`, links with
`-noexec`, and uses `jitlink-check` to verify each `__imp_` slot holds the
target's address and that the references resolve through the slot.

Partly implements github issue:
llvm#190122
In the comment section of the github issue there is this comment
llvm#190122 (comment)
This PR implements point 2 Synthesis IAT entries.
AZero13 and others added 23 commits June 21, 2026 17:15
…205004)

Add a getNumOperandsWithoutMask helper to VPReplicateRecipe, mirroring
the existing VPInstruction::getNumOperandsWithoutMask, and use it to
replace some hand-rolled code.
…205008)

Replace the hand-written check for a VPReplicateRecipe load/store using
the value as its address with VPlan pattern matching via
m_Unary/m_Binary, which also handle masked recipes uniformly.
Adds a Session::ControllerAccess implementation for in-process JIT
setups, where the controller (LLVM-side) and the executor (orc-rt) live
in the same address space.

The two sides communicate through a refcounted C-ABI struct (Connection)
of function pointers. The C-only interface avoids assuming a common C++
ABI between the two sides and supports symmetric, graceful disconnect:
when either side calls Connection::Disconnect, in-flight cross-calls are
drained and pending continuations are surfaced as out-of-band errors,
after which further cross-calls fail cleanly.

This is intended to be paired with a new ExecutorProcessControl
implementation (llvm::orc::InProcessEPC) on the LLVM side, landing in a
follow-up commit. Unit tests are included covering construction without
connect, attach via Session, OnConnect-failure detach, successful and
out-of-band-error call cases, and the disconnect-drains-pending
behavior.
…197862)

AtomicExpand fails for aligned `store atomic <n x T>` because it
does not find a compatible library call. This change adds appropriate
ptrtoint + bitcast so that the call can be lowered, mirroring the
load-side handling.

Store-side counterpart to llvm#148900. Stacked on top of
llvm#201566.
…llvm#205020)

This renames the orc_rt::detail::ScopeExitRunner class to
orc_rt::scope_exit and adds a class template argument deduction guide.
…ons (llvm#193125)

llvm#188400 regressed data-section folding under --icf=safe{,_thunks}:
no-addrsig fallback, and over-broad compiler-emitted addrsig entries
covering data symbols, both caused markSymAsAddrSig to set keepUnique on
data sections, after which foldIdenticalSections refused to fold them.

ld64 coalesces __cfstring, __objc_classrefs and __objc_selrefs
unconditionally regardless of addrsig, so ignore keepUnique for them as
a workaround for the imprecise addrsig payload.
…te type (llvm#203898)

We handled this for pure vector type before but missed the aggregate
types, this patch try to apply same mechanism on them where unsupported
vector types are converted to same size i8 vector types.
This patch does 2 things:
1. Change matmul interface to use newly defined OFP8 RVV types.
2. change all of matmul overloaded interfaces to only keep only widen
information and eliminate types information.
…vm#201506)

fixes llvm#201490 

It would be possible to have `PrevClassTemplate == false` when `SS` was
invalid.

Since it is already invalid, it would be safe to skip
`setMemberSpecialization` for `NewTemplate`. When the qualified scope
specifier is invalid, Sema may have already diagnosed the declaration
and marked it invalid. In that case there may be no previous class
template declaration, so the assertion is too strong. Avoid marking the
new declaration as a member specialization unless the previous class
template exists.
)

The rename brings the scope_exit type's header name into alignment with
other ORC runtime snake_case types.

The [[nodiscard]] attribute should help to prevent accidental misuse of
the type.
When reading extensible binary format profiles with fixed-length MD5
name tables, the reader eagerly allocates and populates a
std::vector<FunctionId> to store the name table.  This eager loading
is particularly wasteful when ProfileIsCS is false, as we populate the
entire name table just to support lookups during profile ingestion,
even though we may only use a subset of the profile.  Since FunctionId
is 16 bytes on 64-bit systems, a name table containing 10 million MD5
hash values would consume 160MB of heap memory.

This patch implements lazy loading for the name table in extensible
binary format profiles when the fixed-length MD5 layout is used.

Specifically, this patch introduces SampleProfileNameTable to
encapsulate the name table representation, supporting both lazy
loading (pointing directly to the memory-mapped buffer) and eager
loading (using a vector).  Eager loading is retained as a fallback for
layouts that do not support O(1) random access (such as
variable-length string tables).

The reader transitions between these modes using setLazy and
resetToEager.  The getNameTable interface is updated to return an
iterator_range of SampleProfileNameTable::iterator, which reads the
MD5 values directly from the buffer on-demand when lazy-loaded.

- Heap consumption: Saves 16 bytes of heap memory for each name table
  entry by avoiding the std::vector allocation.

- Compilation performance: Saves about 4% on ThinLTO pre-link and 10%
  on ThinLTO backend on shared and non-shared profiles.
- The semantics of asyncmarks is now defined purely in terms of
sequences, without referring to the implementation.
- The examples incorrectly used (post)dominance. Fixed that with wording
in terms of asyncmark sequences.
…ions (llvm#200901)

Using `vftintrz.lu.d` for converting scalar double/float values to
unsigned 64-bit integers, and `vffint.d.lu` vice versa.
Replaces class definitions with decls for tag types that don't need a
body, and moves the SPSError tag down to just above it's
serialization-traits class.
…#203394)

Replace a `load <N x i1>` under a sext/zext with a scalar load +
bitcast, so the `combineToExtendBoolVectorInReg` helper can apply,
avoiding scalarization.

Optimisation for the SVE case with a predicate load to be added in a
follow up.

Fixes llvm#200325
Allows SPS serialization to/from ExecutorAddrRange. This will be used in
upcoming patches for compact-unwind registration support.
Seems like llvm#199396 had no
effect at all, even though the patch itself seems pretty obvious.


Change the semantics of the command-line option to support
`-fno-experimental-constant-interpreter` as well. This way, the cmake
option can be used to set the default and the `-f`/`-fno-` command-line
options can be used to override the default behavior.
Remove everything that has to do with named barriers and put it in a
series of model extensions specific to /sbarrier/named-barriers.

I had to change a few things to make it fit, in summary:

Base Model:

- (~) Stylistic changes that make it easier to refer to specific rules.
Each rule is in a rubric instead of a bullet point.
- (-) No longer defines `barrier-mutually-exclusive`
- (-) No longer defines barrier `join` and any associated rule.

New named barrier extensions

- (+) Define "named barrier" as a sub-type of barrier objects. This
makes barrier-mutually-exclusive redundant.
- (+) Define barrier join as an op that can exclusively be done on
`named barrier objects`.
- (+) Define rules relating to join and its ordering with other barrier
operations

Following these changes, the target tables changed a bit as well.

Motive: Barrier _join_ + `barrier-mutually-exclusive` only ever makes
sense when considering named barriers in the ISA. They are an alien
concept to higher-level barrier abstractions.

_Join_ has especially been a pain to deal with and explain in the
general, high-level execution model. Kicking it down into an extension
allows to keep the base model much more concise. As the model extension
is defined w.r.t. the ISA, it's an appropriate place to surface
ISA-specific restrictions. For example we don't need to dance around the
concept of "each thread is a member of at most one named barrier" with
`barrier-mutually-exclusive`. We can just say it straight away when
describing the behavior of _join_ in the model extension.
@rocm-cciapp

rocm-cciapp Bot commented Jun 25, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.