forked from llvm/llvm-project
-
Couldn't load subscription status.
- Fork 2
[pull] main from llvm:main #5643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
627
commits into
Ericsson:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+97,693
−26,846
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ode (#113151) Since pauthtest is a Linux-specific ABI, it should not be handled in common driver code.
Fixes test issue introduced in 357b030
This commit allows const and cast ops with MXFP datatypes through the validation pass when specification version 1.1.draft is selected. Note: it doesn't include support for the mxint8 datatype. This will be added in a separate commit. Note: this commit adds support as defined in the spec in arm/tosa-specification@063846a. EXT_MXFP extension is considered experimental and subject to breaking change.
All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.
…svc (#164133) The predicate system is currently primitive and alternative call predicates should be mutually exclusive.
/usr/bin/ld: unittests/ExecutionEngine/Orc/CMakeFiles/OrcJITTests.dir/L ibraryResolverTest.cpp.o: undefined reference to symbol '_ZN4llvm4yaml1 1convertYAMLERNS0_5InputERNS_11raw_ostreamENS_12function_refIFvRKNS_5Tw ineEEEEjm' /usr/bin/ld: lib/libLLVMObjectYAML.so.22.0git: error adding symbols: DS O missing from command line
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
This has been dead since 97bfb93
Dead code probably left-over of some PR refactoring.
…er (#164613) Follow-up from #162652 --------- Co-authored-by: Michael Klemm <[email protected]>
Eventually this should be program state, and not part of TargetLowering so avoid direct references to the libcall functions in it. The usage of RuntimeLibcallsInfo here is not good though, particularly the use through TargetTransformInfo. It would be better if the IR attributes were directly encoded in the libcall definition (or at least made consistent elsewhere). The parsing of the attributes should not also be responsible for doing the libcall recognition, which is the only part pulling in the dependency.
This patch adds HVX vgather/vscatter genertion for i16, i32, and i8. It also adds a flag to control generation of scatter/gather instructions for HVX. Default to "disable". Co-authored-by: Sergei Larin <[email protected]> Co-authored-by: Sergei Larin <[email protected]> Co-authored-by: Maxime Schmitt <[email protected]>
…nnotations' crashes (#164819)
#163067) Loads the data into the SIMD register, thus sparing a physical register and a potentially costly movement of data.
This commit adds support for the cast_from/to_block_scaled operations from the ext-mxfp extension. This includes: - Operation definition in TosaOps.td - Micro-scaling supported types definition - Shape inference and verifiers - Validation pass checks to ensure usage is only valid when the target environment includes ext-mxfp and at least v1.1.draft of the specification. Note: currently it excludes support for mxint8. This will be added in a later commit. Note: this commit adds support as defined in the spec in arm/tosa-specification@063846a. EXT_MXFP extension is considered experimental and subject to breaking change. Co-authored-by: Tat Wai Chong <[email protected]>
Fix 3 cases in the scheduler tables to match the current SWOG, in section 3.29 SVE Store: change pipeline V to V01.
Clean up AnalysisConsumer code from the timer-related branches that are not used most of the time, and move this logic to Timer.cpp, which is a more relevant place and allows for a cleaner implementation.
If we only deal with a one part of 64bit value we can just generate merge and unmerge which will be either combined away or selected into copy / mov_b32.
Post cleanup for #164534. Also pick suggestion by nikic, remove redundant attributes.
This patch fixes the computation of the absolute value for SCEV. Previously, it was calculated as `AbsX = SE->isKnownNonNegative(X) ? X : -X`, which would incorrectly assume that `!isKnownNonNegative` implies `isKnownNegative`. This assumption does not hold in general, for example, when `X` is a `SCEVUnknown` and it can be an arbitrary value. To compute the absolute value properly, we should use ScalarEvolution::getAbsExpr instead. Fix #149977
…sert (#164991) Clang always duplicates SME attributes to each callsite, which means removing "ZA_State_Agnostic" from CalledFn before the assert resulted in the assertion failing for IR emitted by clang. I've updated the existing test to match the form emitted by clang (which previously hit the assert).
This patch introduces support for the Hexagon V81 architecture. It includes instruction formats, definitions, encodings, scheduling classes, and builtins/intrinsics.
Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: #163894
This is important if the first use of a StatusOr (or Status) is in a conditional statement, we need a stable value for `ok` from outside of the conditional statement to make sure we don't use a different variable in every branch. Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: #163898
This tool provides a harness for implementing different strategies that summarize many remarks (possibly from multiple translation units) into new summary remarks. The remark summaries can then be viewed using tools like `opt-viewer`. The first summary strategy is `--inline-callees`, which generates remarks that summarize the per-callee inline statistics for functions that appear in inling remarks. This is useful for troubleshooting inlining issues/regressions on large codebases. Pull Request: #160549
…Us (#164761) Temps needed for the allocatable reduction/privatization init regions are now allocated on the heap all the time. However, this is performance killer for GPUs since malloc calls are prohibitively expensive. Therefore, we should do these allocations on the stack for GPU reductions. This is similar to what we do for arrays. Additionally, I am working on getting reductions-by-ref to work on GPUs which is a bit of a challenge given the many involved steps (e.g. intra-warp and inter-warp reuctions, shuffling data from remote lanes, ...). But this is a prerequisite step.
Back in #69493 the `-debug-info-correlate` LLVM flag was deprecated in favor of `-profile-correlate=debug-info`. Update all tests to use this new flag.
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit general-purpose register instead of SystemZ::CC. --------- Co-authored-by: anoopkg6 <[email protected]> Co-authored-by: Ulrich Weigand <[email protected]>
Some machines have read-only vtables but this test expects to overwrite them. Use -no_data_const to ensure the vtable is writable
This reverts commit 9a0aa92.
These are the other options used in compiler-rt that we also need to support. Reviewers: arichardson, petrhosek, ilovepi Reviewed By: ilovepi, arichardson Pull Request: #165122
Pre-commit test for PR: #162580
When lowering spills / restores, we may end up partially lowering the spill via copies and the remaining portion with loads/stores. In this partial lowering case,the implicit-def operands added to the restore load clobber the preceding copies -- telling MachineCopyPropagation to delete them. By also attaching an implicit operand to the load, the COPYs have an artificial use and thus will not be deleted - this is the same strategy taken in #115285 I'm not sure that we need implicit-def operands on any load restore, but I guess it may make sense if it needs to be split into multiple loads and some have been optimized out as containing undef elements. These implicit / implicit-def operands continue to cause correctness issues. A previous / ongoing long term plan to remove them is being addressed via: https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021 #151123 #151124
This PR passes the VFS to LLVM's sanitizer passes from Clang, so that the configuration files can be loaded in the same way all other compiler inputs are.
The options -fbuiltin and -fno-builtin are not valid for Fortran. However, they are accepted by gfortran which emits a warning message but continues to compile the code. Both -fbuiltin and -fno-builtin have been enabled for flang. Specifying either will result in a warning message being shown but no other effects. Compilation will proceed normally after these warnings are shown. This brings flang's behavior in line with gfortran for these options. Fixes #164766
…164905) Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the `--batch` flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work. But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails. We were running the Write to the interrupt pipe wrapped in `llvm::cantFail`, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point. I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.
) new ```C++ auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return; // }; auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return aaaaaaaaaaaaaaaaaaaaa; // }; ``` old ```C++ auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return; // }; auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return aaaaaaaaaaaaaaaaaaaaa; // }; ``` Aligning a line to another line involves keeping track of the tokens' positions. Previously the shift was incorrectly added to some tokens that did not move. Then the comments would end up in the wrong places.
…en size dimension value is 0 (#164878) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `tensor.extract_slice` operations where one of the dimensions had a size of 0. The `tensor.extract_slice` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `tensor.extract_slice`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%3` becomes 0. ```mlir func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> { %cst = arith.constant dense<0> : tensor<1xi32> %cst_0 = arith.constant dense<-1> : tensor<2xi32> %c-1 = arith.constant -1 : index %c0 = arith.constant 0 : index %c10 = arith.constant 10 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c2 = arith.constant 2 : index %0 = tensor.empty() : tensor<3xi32> %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32> %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32> %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32> %1 = index.casts %extracted : i32 to index %2 = arith.cmpi eq, %1, %c-1 : index %3 = arith.select %2, %c10, %1 : index %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32> %4 = index.casts %extracted_2 : i32 to index %5 = arith.cmpi eq, %4, %c-1 : index %6 = arith.select %5, %c4, %4 : index %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32> %7 = index.casts %extracted_3 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c1, %7 : index %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return %extracted_slice : tensor<?x?x?xf32> } ``` The issue can be reproduced more simply with the following test case, where `dim_0` is `0`. When the runtime verification pass is applied to this code with `dim_0 = 0`, it generates an assertion that will always fail at runtime. ```mlir func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index) { %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return } func.func @test_zero_size_extraction() { %input = arith.constant dense<1.0> : tensor<10x4x1xf32> // Define slice dimensions: 0x4x1 (zero-size in first dimension) %dim_0 = arith.constant 0 : index %dim_1 = arith.constant 4 : index %dim_2 = arith.constant 1 : index func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> () return } ``` P.S. We probably have a similar issue with `memref.subview`. I will check this and send a separate PR for the issue. --------- Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…dimension value is 0 (#164897) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `memref.subview` operations where one of the dimensions had a size of 0. The `memref.subview` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through a LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `memref.subview`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%5` becomes 0. ```mlir module { memref.global "private" constant @__constant_2xi32 : memref<2xi32> = dense<-1> {alignment = 64 : i64} memref.global "private" constant @__constant_1xi32 : memref<1xi32> = dense<0> {alignment = 64 : i64} func.func @simpleRepro(%arg0: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> { %c2 = arith.constant 2 : index %c4 = arith.constant 4 : index %c1 = arith.constant 1 : index %c10 = arith.constant 10 : index %c0 = arith.constant 0 : index %c-1 = arith.constant -1 : index %0 = memref.get_global @__constant_1xi32 : memref<1xi32> %1 = memref.get_global @__constant_2xi32 : memref<2xi32> %alloca = memref.alloca() {alignment = 64 : i64} : memref<3xi32> %subview = memref.subview %alloca[0] [1] [1] : memref<3xi32> to memref<1xi32, strided<[1]>> memref.copy %0, %subview : memref<1xi32> to memref<1xi32, strided<[1]>> %subview_0 = memref.subview %alloca[1] [2] [1] : memref<3xi32> to memref<2xi32, strided<[1], offset: 1>> memref.copy %1, %subview_0 : memref<2xi32> to memref<2xi32, strided<[1], offset: 1>> %2 = memref.load %alloca[%c0] : memref<3xi32> %3 = index.casts %2 : i32 to index %4 = arith.cmpi eq, %3, %c-1 : index %5 = arith.select %4, %c10, %3 : index %6 = memref.load %alloca[%c1] : memref<3xi32> %7 = index.casts %6 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c4, %7 : index %10 = memref.load %alloca[%c2] : memref<3xi32> %11 = index.casts %10 : i32 to index %12 = arith.cmpi eq, %11, %c-1 : index %13 = arith.select %12, %c1, %11 : index %subview_1 = memref.subview %arg0[0, 0, 0] [%5, %9, %13] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> return %subview_1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> } } ``` P.S. This is a similar issue to the one fixed for `tensor.extract_slice` in #164878 --------- Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…LVM IR (#165286) There's a couple of tests like this. This patch series renames these to something more descriptive and adjusts the tests to check IR. Currently the tests check raw assembly output (not even dwarfdump). Which most likely hid some bugs around property debug-info.
These tests not supported on AIX and z/OS, disable them to get the clang-ppc64-aix green
…165021) The type sizes of backedge taken counts for two loops can be different and this is to fix the crash in haveSameSD (#165014). --------- Co-authored-by: Shimin Cui <[email protected]>
Changes test name to something more meaningful. In preparation to refactoring the test to check LLVM IR instead of assembly.
Currently ExecutionEngine tries to dump all functions declared in the
module, even those which are "external" (i.e., linked/loaded at
runtime). E.g.
```mlir
func.func private @printF32(f32)
func.func @supported_arg_types(%arg0: i32, %arg1: f32) {
call @printF32(%arg1) : (f32) -> ()
return
}
```
fails with
```
Could not compile printF32:
Symbols not found: [ __mlir_printF32 ]
Program aborted due to an unhandled Error:
Symbols not found: [ __mlir_printF32 ]
```
even though `printF32` can be provided at final build time (i.e., when
the object file is linked to some executable or shlib). E.g, if our own
`libmlir_c_runner_utils` is linked.
So just skip functions which have no bodies during dump (i.e., are decls
without defns).
Adds `arm64-apple-darwin` support to `asm.py` matching and removes now invalidated `target-triple-mismatch` test (I dont have another triple supported by llc but not the autogenerator that make this test useful).
Don't rely on comparison to singular iterator, it's UB. Fixes bot crashes after #164524.
Check for lack of `setter` and `getter` attributes on `DIObjCProperty`
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )