Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Oct 22, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot locked and limited conversation to collaborators Oct 22, 2025
@pull pull bot added the ⤵️ pull label Oct 22, 2025
resistor and others added 28 commits October 24, 2025 20:21
…ode (#113151)

Since pauthtest is a Linux-specific ABI, it should not be handled in
common driver code.
According to the manual, bits 3...0 should be 1101. (1100 is `movi.n`.)
This commit allows const and cast ops with MXFP datatypes through the
validation pass when specification version 1.1.draft is selected.

Note: it doesn't include support for the mxint8 datatype. This will be
added in a separate commit.

Note: this commit adds support as defined in the spec in
arm/tosa-specification@063846a.
EXT_MXFP extension is considered experimental and subject to breaking
change.
All 3 implementations are just checking if this has the
windows check function, so merge that as the only implementation.
…svc (#164133)

The predicate system is currently primitive and alternative call
predicates
should be mutually exclusive.
/usr/bin/ld: unittests/ExecutionEngine/Orc/CMakeFiles/OrcJITTests.dir/L
ibraryResolverTest.cpp.o: undefined reference to symbol '_ZN4llvm4yaml1
1convertYAMLERNS0_5InputERNS_11raw_ostreamENS_12function_refIFvRKNS_5Tw
ineEEEEjm'
/usr/bin/ld: lib/libLLVMObjectYAML.so.22.0git: error adding symbols: DS
O missing from command line
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
Dead code probably left-over of some PR refactoring.
Eventually this should be program state, and not part of TargetLowering
so avoid direct references to the libcall functions in it.

The usage of RuntimeLibcallsInfo here is not good though, particularly
the use through TargetTransformInfo. It would be better if the IR
attributes were directly encoded in the libcall definition (or at least made
consistent elsewhere). The parsing of the attributes should not also be 
responsible for doing the libcall recognition, which is the only part pulling in 
the dependency.
This patch adds HVX vgather/vscatter genertion for i16, i32, and i8. It
also adds a flag to control generation of scatter/gather instructions
for HVX. Default to "disable".

Co-authored-by: Sergei Larin <[email protected]>
Co-authored-by: Sergei Larin <[email protected]>
Co-authored-by: Maxime Schmitt <[email protected]>
#163067)

Loads the data into the SIMD register, thus sparing a physical register
and a potentially costly movement of data.
This commit adds support for the cast_from/to_block_scaled operations
from the ext-mxfp extension. This includes:
- Operation definition in TosaOps.td
- Micro-scaling supported types definition
- Shape inference and verifiers
- Validation pass checks to ensure usage is only valid when the target
environment includes ext-mxfp and at least v1.1.draft of the
specification.

Note: currently it excludes support for mxint8. This will be added in a
later commit.

Note: this commit adds support as defined in the spec in
arm/tosa-specification@063846a.
EXT_MXFP extension is considered experimental and subject to breaking
change.

Co-authored-by: Tat Wai Chong <[email protected]>
Fix 3 cases in the scheduler tables to match the current SWOG, 
in section 3.29 SVE Store: change pipeline V to V01.
Clean up AnalysisConsumer code from the timer-related branches that are
not used most of the time, and move this logic to Timer.cpp, which is a
more relevant place and allows for a cleaner implementation.
If we only deal with a one part of 64bit value we can just generate
merge and unmerge which will be either combined away or
selected into copy / mov_b32.
Post cleanup for #164534.
Also pick suggestion by nikic, remove redundant attributes.
This patch fixes the computation of the absolute value for SCEV.
Previously, it was calculated as `AbsX = SE->isKnownNonNegative(X) ? X :
-X`, which would incorrectly assume that `!isKnownNonNegative` implies
`isKnownNegative`. This assumption does not hold in general, for
example, when `X` is a `SCEVUnknown` and it can be an arbitrary value.
To compute the absolute value properly, we should use
ScalarEvolution::getAbsExpr instead.

Fix #149977
…sert (#164991)

Clang always duplicates SME attributes to each callsite, which means
removing "ZA_State_Agnostic" from CalledFn before the assert resulted in
the assertion failing for IR emitted by clang.

I've updated the existing test to match the form emitted by clang (which
previously hit the assert).
This patch introduces support for the Hexagon V81 architecture. It
includes instruction formats, definitions, encodings, scheduling
classes, and builtins/intrinsics.
nbpatel and others added 30 commits October 27, 2025 09:10
Reviewers: jvoung, Xazax-hun

Reviewed By: jvoung

Pull Request: #163894
This is important if the first use of a StatusOr (or Status) is in a
conditional statement, we need a stable value for `ok` from outside of
the conditional statement to make sure we don't use a different variable
in every branch.

Reviewers: jvoung, Xazax-hun

Reviewed By: jvoung

Pull Request: #163898
This tool provides a harness for implementing different strategies that
summarize many remarks (possibly from multiple translation units) into
new summary remarks. The remark summaries can then be viewed using tools
like `opt-viewer`.

The first summary strategy is `--inline-callees`, which generates
remarks that summarize the per-callee inline statistics for functions
that appear in inling remarks. This is useful for troubleshooting
inlining issues/regressions on large codebases.

Pull Request: #160549
…Us (#164761)

Temps needed for the allocatable reduction/privatization init regions
are now allocated on the heap all the time. However, this is performance
killer for GPUs since malloc calls are prohibitively expensive.
Therefore, we should do these allocations on the stack for GPU
reductions.

This is similar to what we do for arrays. Additionally, I am working on
getting reductions-by-ref to work on GPUs which is a bit of a challenge
given the many involved steps (e.g. intra-warp and inter-warp reuctions,
shuffling data from remote lanes, ...). But this is a prerequisite step.
Back in #69493 the
`-debug-info-correlate` LLVM flag was deprecated in favor of
`-profile-correlate=debug-info`. Update all tests to use this new flag.
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit
general-purpose register instead of SystemZ::CC.

---------

Co-authored-by: anoopkg6 <[email protected]>
Co-authored-by: Ulrich Weigand <[email protected]>
Some machines have read-only vtables but this test expects to overwrite
them. Use -no_data_const to ensure the vtable is writable
These are the other options used in compiler-rt that we also need to
support.

Reviewers: arichardson, petrhosek, ilovepi

Reviewed By: ilovepi, arichardson

Pull Request: #165122
)

Add implementation and encoding tests for:
- tlbiep
-  tlbieio
- tlbsyncio
- ptesyncio
When lowering spills / restores, we may end up partially lowering the
spill via copies and the remaining portion with loads/stores. In this
partial lowering case,the implicit-def operands added to the restore
load clobber the preceding copies -- telling MachineCopyPropagation to
delete them. By also attaching an implicit operand to the load, the
COPYs have an artificial use and thus will not be deleted - this is the
same strategy taken in #115285

I'm not sure that we need implicit-def operands on any load restore, but
I guess it may make sense if it needs to be split into multiple loads
and some have been optimized out as containing undef elements.

These implicit / implicit-def operands continue to cause correctness
issues. A previous / ongoing long term plan to remove them is being
addressed via:


https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021
#151123
#151124
This PR passes the VFS to LLVM's sanitizer passes from Clang, so that
the configuration files can be loaded in the same way all other compiler
inputs are.
The options -fbuiltin and -fno-builtin are not valid for Fortran. However, they
are accepted by gfortran which emits a warning message but continues to compile
the code. Both -fbuiltin and -fno-builtin have been enabled for flang.
Specifying either will result in a warning message being shown but no other
effects. Compilation will proceed normally after these warnings are shown. This
brings flang's behavior in line with gfortran for these options.

Fixes #164766
…164905)

Turns out there's a bug in the current lldb sources that if you fork,
set the stdio file handles to close on exec and then exec lldb with some
commands and the `--batch` flag, lldb will stall on exit. The first
cause of the bug is that the Python session handler - and probably other
places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles,
and open and close and dup them as needed. NB: I am NOT trying to fix
that bug. I'm not convinced running the lldb driver headless is worth a
lot of effort, it's just as easy to redirect them to /dev/null, which
does work.

But I would like to keep lldb from stalling on the way out when this
happens. The reason we stall is that we have a MainLoop waiting for
signals, and we try to Interrupt it, but because stdio was closed, the
interrupt pipe for the MainLoop gets the file descriptor 0, which gets
closed by the Python session handler if you run some script command. So
the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in
`llvm::cantFail`, but in a no asserts build that just drops the error on
the floor. So then lldb went on to call std::thread::join on the still
active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool
for "interrupt success" instead. All the places where code was
requesting termination, I added checks for that failure, and skip the
std::thread::join call on the MainLoop thread, since that is almost
certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when
the WSASetEvent call can fail, so I always return true here. I also
didn't turn the test off for Windows. According to the Python docs all
the API's I used should work on Windows... If that turns out not to be
true I'll make the test Darwin/Unix only.
)

new

```C++
auto aaaaaaaaaaaaaaaaaaaaa = {};  //
auto b                     = [] { //
  return;                         //
};

auto aaaaaaaaaaaaaaaaaaaaa = {};  //
auto b                     = [] { //
  return aaaaaaaaaaaaaaaaaaaaa;   //
};
```

old

```C++
auto aaaaaaaaaaaaaaaaaaaaa = {};  //
auto b                     = [] { //
  return;     //
};

auto aaaaaaaaaaaaaaaaaaaaa = {};                    //
auto b                     = [] {                   //
  return aaaaaaaaaaaaaaaaaaaaa; //
};
```

Aligning a line to another line involves keeping track of the tokens'
positions. Previously the shift was incorrectly added to some tokens
that did not move. Then the comments would end up in the wrong places.
…en size dimension value is 0 (#164878)

Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `tensor.extract_slice` operations where one of the
dimensions had a size of 0.

The `tensor.extract_slice` runtime verification logic was
unconditionally generating checks for the position of the last element
(`offset + (size - 1) * stride`). When `size` is 0, this causes the
assertion condition to always be false, leading to runtime failures even
though the operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size > 0` to avoid generating spurious assert statements.

This issue was discovered through LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`tensor.extract_slice`.

The following is a simplified IR snippet from the model. After running
the runtime verification pass, an assertion that always fails is
generated because the SSA value `%3` becomes 0.

```mlir
func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> {
  %cst = arith.constant dense<0> : tensor<1xi32>
  %cst_0 = arith.constant dense<-1> : tensor<2xi32>
  %c-1 = arith.constant -1 : index
  %c0 = arith.constant 0 : index
  %c10 = arith.constant 10 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c2 = arith.constant 2 : index
  %0 = tensor.empty() : tensor<3xi32>
  %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32>
  %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32>
  %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32>
  %1 = index.casts %extracted : i32 to index
  %2 = arith.cmpi eq, %1, %c-1 : index
  %3 = arith.select %2, %c10, %1 : index
  %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32>
  %4 = index.casts %extracted_2 : i32 to index
  %5 = arith.cmpi eq, %4, %c-1 : index
  %6 = arith.select %5, %c4, %4 : index
  %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32>
  %7 = index.casts %extracted_3 : i32 to index
  %8 = arith.cmpi eq, %7, %c-1 : index
  %9 = arith.select %8, %c1, %7 : index
  %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32>
  return %extracted_slice : tensor<?x?x?xf32>
}
```

The issue can be reproduced more simply with the following test case,
where `dim_0` is `0`. When the runtime verification pass is applied to
this code with `dim_0 = 0`, it generates an assertion that will always
fail at runtime.

```mlir
func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>,
                                      %dim_0: index,
                                      %dim_1: index,
                                      %dim_2: index) {
  %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1]
    : tensor<10x4x1xf32> to tensor<?x?x?xf32>
  return
}

func.func @test_zero_size_extraction() {
  %input = arith.constant dense<1.0> : tensor<10x4x1xf32>
  // Define slice dimensions: 0x4x1 (zero-size in first dimension)
  %dim_0 = arith.constant 0 : index
  %dim_1 = arith.constant 4 : index
  %dim_2 = arith.constant 1 : index
  func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2)
    : (tensor<10x4x1xf32>, index, index, index) -> ()
  return
}
```

P.S. We probably have a similar issue with `memref.subview`. I will
check this and send a separate PR for the issue.

---------

Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…dimension value is 0 (#164897)

Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `memref.subview` operations where one of the
dimensions had a size of 0.

The `memref.subview` runtime verification logic was unconditionally
generating checks for the position of the last element (`offset + (size
- 1) * stride`). When `size` is 0, this causes the assertion condition
to always be false, leading to runtime failures even though the
operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size > 0` to avoid generating spurious assert statements.

This issue was discovered through a LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`memref.subview`. The following is a simplified IR snippet from the
model. After running the runtime verification pass, an assertion that
always fails is generated because the SSA value `%5` becomes 0.

```mlir
module {
  memref.global "private" constant @__constant_2xi32 : memref<2xi32> = dense<-1> {alignment = 64 : i64}
  memref.global "private" constant @__constant_1xi32 : memref<1xi32> = dense<0> {alignment = 64 : i64}
  func.func @simpleRepro(%arg0: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> {
    %c2 = arith.constant 2 : index
    %c4 = arith.constant 4 : index
    %c1 = arith.constant 1 : index
    %c10 = arith.constant 10 : index
    %c0 = arith.constant 0 : index
    %c-1 = arith.constant -1 : index
    %0 = memref.get_global @__constant_1xi32 : memref<1xi32>
    %1 = memref.get_global @__constant_2xi32 : memref<2xi32>
    %alloca = memref.alloca() {alignment = 64 : i64} : memref<3xi32>
    %subview = memref.subview %alloca[0] [1] [1] : memref<3xi32> to memref<1xi32, strided<[1]>>
    memref.copy %0, %subview : memref<1xi32> to memref<1xi32, strided<[1]>>
    %subview_0 = memref.subview %alloca[1] [2] [1] : memref<3xi32> to memref<2xi32, strided<[1], offset: 1>>
    memref.copy %1, %subview_0 : memref<2xi32> to memref<2xi32, strided<[1], offset: 1>>
    %2 = memref.load %alloca[%c0] : memref<3xi32>
    %3 = index.casts %2 : i32 to index
    %4 = arith.cmpi eq, %3, %c-1 : index
    %5 = arith.select %4, %c10, %3 : index
    %6 = memref.load %alloca[%c1] : memref<3xi32>
    %7 = index.casts %6 : i32 to index
    %8 = arith.cmpi eq, %7, %c-1 : index
    %9 = arith.select %8, %c4, %7 : index
    %10 = memref.load %alloca[%c2] : memref<3xi32>
    %11 = index.casts %10 : i32 to index
    %12 = arith.cmpi eq, %11, %c-1 : index
    %13 = arith.select %12, %c1, %11 : index
    %subview_1 = memref.subview %arg0[0, 0, 0] [%5, %9, %13] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
    return %subview_1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
  }
}
```

P.S. This is a similar issue to the one fixed for `tensor.extract_slice`
in #164878

---------

Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…LVM IR (#165286)

There's a couple of tests like this.

This patch series renames these to something more descriptive and
adjusts the tests to check IR. Currently the tests check raw assembly
output (not even dwarfdump). Which most likely hid some bugs around
property debug-info.
These tests not supported on AIX and z/OS, disable them to get the
clang-ppc64-aix green
…165021)

The type sizes of backedge taken counts for two loops can be different
and this is to fix the crash in haveSameSD
(#165014).

---------

Co-authored-by: Shimin Cui <[email protected]>
Changes test name to something more meaningful. In preparation to
refactoring the test to check LLVM IR instead of assembly.
Currently ExecutionEngine tries to dump all functions declared in the
module, even those which are "external" (i.e., linked/loaded at
runtime). E.g.

```mlir
func.func private @printF32(f32)
func.func @supported_arg_types(%arg0: i32, %arg1: f32) {
  call @printF32(%arg1) : (f32) -> ()
  return
}
```
fails with
```
Could not compile printF32:
  Symbols not found: [ __mlir_printF32 ]
Program aborted due to an unhandled Error:
Symbols not found: [ __mlir_printF32 ]
```
even though `printF32` can be provided at final build time (i.e., when
the object file is linked to some executable or shlib). E.g, if our own
`libmlir_c_runner_utils` is linked.

So just skip functions which have no bodies during dump (i.e., are decls
without defns).
Adds `arm64-apple-darwin` support to `asm.py` matching and removes now
invalidated `target-triple-mismatch` test (I dont have another triple
supported by llc but not the autogenerator that make this test useful).
Don't rely on comparison to singular iterator, it's UB.

Fixes bot crashes after
#164524.
Check for lack of `setter` and `getter` attributes on `DIObjCProperty`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.