Skip to content

[EXPERIMENT] Replace numba-cuda with numba-cuda-mlir#9421

Open
shwina wants to merge 4 commits into
NVIDIA:mainfrom
shwina:python/numba-mlir-3-gpu-struct
Open

[EXPERIMENT] Replace numba-cuda with numba-cuda-mlir#9421
shwina wants to merge 4 commits into
NVIDIA:mainfrom
shwina:python/numba-mlir-3-gpu-struct

Conversation

@shwina

@shwina shwina commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

shwina and others added 4 commits June 12, 2026 06:26
cuda.compute is migrating its JIT/struct machinery from numba-cuda to
numba-cuda-mlir (the MLIR-based successor). This first step adds the
dependency and a single import surface; no behavior changes yet.

- pyproject: add numba-cuda-mlir[cu12]/[cu13] to the cu12/cu13/sysctk
  runtime extras, alongside numba-cuda (which is still needed by
  _compile_op_to_llvm_bitcode on the v2/HostJIT path). Ignore
  numba_cuda_mlir.* in mypy like numba.*.
- _mlir.py: central re-export of the numba-cuda-mlir symbols the
  migration uses (cuda.compile, type system, typing/lowering extension
  API, data models, MLIR builder + llvm/arith dialects), plus small
  from_numpy_dtype/as_numpy_dtype/struct_field_position helpers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the user-operator compilation path off numba-cuda and onto
numba-cuda-mlir (the MLIR-based successor). The gpu_struct typing/
lowering machinery still uses numba-cuda and is migrated separately.

_odr_helpers.py: the void* operator wrappers are now ordinary Python
device functions compiled with abi="c", instead of hand-written
@intrinsic LLVM-IR codegen. A void* argument is a typed CPointer
parameter (ABI-identical to void*); loads/stores are ptr[0] indexing;
numba-cuda-mlir inlines the user op into the wrapper. The unused
iterator advance/dereference wrappers are dropped (iterators compile
their device code via C++, not numba). Stateful state is unpacked from
the packed void* via a CPointer(CPointer(dtype)) view; heterogeneous
state dtypes are rejected (no pure-Python int->typed-pointer cast).

_compile_op_to_llvm_bitcode: numba-cuda-mlir's cuda.compile only emits
ptx/ltoir, so the v2 (HostJIT) LLVM bitcode is produced by extracting
LLVM IR from its internal MLIR -> LLVM translation (one step before
libnvvm) and lowering that to bitcode with llvmlite.

_jit.py: op compilation, return-type inference, stateful-op compilation,
and the POD/pointer TypeDescriptor<->numba conversions now use
numba-cuda-mlir (via the _mlir surface). Both v1 (ltoir) and v2
(bitcode) compile the same numba-cuda-mlir-jitted wrapper.

_mlir.py: add compile_to_llvm_ir() encapsulating the MLIR->LLVM IR
extraction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reimplement the gpu_struct type machinery against numba-cuda-mlir's MLIR
extension API, completing the migration: _jit.py no longer imports
classic numba at all (numba-cuda stays a dependency only for cuda.coop).

Typing:
- The struct type subclasses numba_cuda_mlir.types.Type; tuple/struct
  conversions still use can_convert_from + Conversion.safe.
- Data model: register_model + PrimitiveModel building the backend type
  as an MLIR llvm.StructType.new_identified over the fields' MLIR value
  types (replaces numba-cuda's models.StructModel).
- Field access typing uses an AttributeTemplate via typing_registry
  (replaces make_attribute_wrapper, which has no MLIR equivalent).
- Constructor typing uses a ConcreteTemplate registered with
  typing_registry.register_global (replaces the numba.cuda cudadecl
  registry).

Lowering (MLIR instead of llvmlite/cgutils):
- Field getattr: lower_getattr_generic + llvm.extractvalue.
- Constructor: lower() + llvm.UndefOp/insertvalue.
- tuple->struct and struct->struct casts: lower_cast + llvm
  extract/insertvalue (a tuple value is a Python sequence of MLIR values
  pre-concretization).

_mlir.py: export Conversion.

Validated headlessly: struct construct + field access compile to LTO-IR
via the same MLIR pattern. The aggregate casts are the area most in need
of validation against the struct test suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- test_void_ptr_wrapper_validation.py: rewrite against the new
  _make_wrapper_name API (the @intrinsic-era _ArgMode / _ArgSpec /
  _create_void_ptr_wrapper internals were removed); keep the
  sanitize_identifier coverage.
- test_merge_sort.py: xfail the unsigned compare_op cases. The test
  comparator np.uint8(lhs < rhs) hits a numba-cuda-mlir bug that
  miscompiles unsigned integer comparison as signed; signed/float
  comparators are unaffected. Remove once numba-cuda-mlir fixes it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shwina shwina requested a review from a team as a code owner June 12, 2026 12:51
@shwina shwina requested a review from tpn June 12, 2026 12:51
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 12, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 12, 2026
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Overview

This experimental PR begins migrating NVIDIA CCCL's JIT compilation and struct machinery from the classic numba-cuda to numba-cuda-mlir, modernizing the compilation pipeline to use MLIR-based lowering and addressing upstream limitations in the process.

Key Changes

New Central Export Surface

  • python/cuda_cccl/cuda/compute/_mlir.py (152 lines added)
    A new public module serving as the central re-export point for numba-cuda-mlir functionality. Exports key components including cuda, types, errors, numpy_support, signature, and lowering/typing infrastructure (overload, lowering_registry, typing_registry, etc.). Provides wrapper helpers for NumPy dtype conversion (from_numpy_dtype, as_numpy_dtype) and a new compile_to_llvm_ir(pyfunc, sig, abi_name: str) -> str function that compiles device functions through the MLIR pipeline, extracts LLVM IR text, and returns bitcode-ready output. Includes an MLIR attribute helper (struct_field_position).

JIT Compilation Migration

  • python/cuda_cccl/cuda/compute/_jit.py (+276/-209 lines)
    Comprehensive refactoring replacing direct Numba APIs with the _mlir pipeline:
    • _compile_op_to_llvm_bitcode rewritten to obtain LLVM IR via _mlir.compile_to_llvm_ir, parse/verify with llvmlite, emit debug artifacts, and return bitcode bytes
    • CCCL struct typing/casting system reimplemented on _mlir infrastructure; _StructBase now derives from _mlir type base
    • Struct field validation/conversion and can_convert_from logic updated to use _mlir tuple/literal types and conversions
    • Attribute access lowering implemented via _mlir typing registry templates and MLIR lowerings
    • Struct indexing and construction lowering added through _mlir.overload and _mlir.lowering_registry
    • Tuple-to-struct and struct-to-struct casts rewritten to operate on MLIR values using _mlir.extractvalue and convert (replacing Numba cgutils)
    • Return-type inference and op compilation switched to _mlir.cuda.compile with explicit device=True, abi="c", and output="ltoir" parameters
    • Stateful op compilation refactored to pass state as typed CPointer arguments; wrapper creation now uses create_stateful_op_void_ptr_wrapper with state_dtypes

Wrapper Generation Refactoring

  • python/cuda_cccl/cuda/compute/_odr_helpers.py (+143/-299 lines)
    Replaced LLVM IR/intrinsic-based codegen with a Python source construction approach for void* wrappers:
    • New helper functions create unique wrapper symbol names and build wrapper device functions via exec()-ed Python source
    • create_op_void_ptr_wrapper reimplemented to call a cuda.jit(device=True)-compiled operator and store results through result[0]
    • create_stateful_op_void_ptr_wrapper API simplified: now accepts state_dtypes instead of state_array_types/state_info, enforces uniform captured-dtype support
    • Removed wrapper constructors: create_advance_void_ptr_wrapper, create_input_dereference_void_ptr_wrapper, create_output_dereference_void_ptr_wrapper
    • Removed argument-mode enum, state unpacking LLVM IR helpers, and intrinsic-based wrapper codegen

Dependency and Configuration

  • python/cuda_cccl/pyproject.toml (+13/-2 lines)
    Added numba-cuda-mlir[cu12]>=0.3.0 and numba-cuda-mlir[cu13]>=0.3.0 as dependencies across all CUDA 12/13 extras variants. Updated Mypy configuration to ignore numba_cuda_mlir.* module patterns.

Test Infrastructure Updates

  • python/cuda_cccl/tests/compute/conftest.py (+111 lines)
    New centralized test infrastructure to mark compute tests as expected-to-fail for known numba-cuda-mlir upstream issues. Adds _upstream_xfail_reason() helper that matches test names/nodeids against known failure conditions and applies pytest.mark.xfail(strict=False) accordingly.

  • python/cuda_cccl/tests/conftest.py (+45 lines)
    Shared Pytest configuration for the test suite. Introduces _EXAMPLE_XFAILS mapping to mark specific compute example tests as xfail based on numba-cuda-mlir limitations, with pytest_collection_modifyitems hook to attach xfail markers.

  • python/cuda_cccl/tests/compute/test_void_ptr_wrapper_validation.py (+23/-37 lines)
    Test rewritten from validating _create_void_ptr_wrapper to validating _make_wrapper_name post-sanitization. Updated imports and test suite now covers acceptance and rejection cases for unique identifier generation.

Technical Details

Compilation Pipeline

The migration from classic Numba to numba-cuda-mlir updates the operator compilation workflow:

  • Compiles to optimized MLIR with ltoir output
  • Extracts LLVM IR text from the MLIR module
  • Parses and verifies IR with llvmlite
  • Generates bitcode suitable for offline compilation

Struct Type Handling

Struct types now use MLIR's native llvm.StructType with MLIR-based lowering operations (extractvalue, insertvalue, undef) rather than Numba's cgutils struct proxies. The data model registration uses AttributeTemplate, ConcreteTemplate, and register_model from the _mlir infrastructure.

Known Issues

A merge_sort test has been xfailed for specific unsigned-comparison cases due to a known numba-cuda-mlir miscompile.

Breaking Changes

  • create_stateful_op_void_ptr_wrapper signature changed: now accepts state_dtypes (list of MLIR dtypes) instead of state_array_types and state_info
  • Removed wrapper functions: create_advance_void_ptr_wrapper, create_input_dereference_void_ptr_wrapper, create_output_dereference_void_ptr_wrapper
  • All JIT compilation paths now route through _mlir.cuda.compile instead of direct Numba APIs

Notes

The PR introduction marks this as experimental. While the public API surface (e.g., to_jit_op_adapter) remains unchanged, internal compilation machinery is significantly restructured. Test infrastructure updates anticipate ongoing upstream issues with numba-cuda-mlir during stabilization.

Walkthrough

This PR migrates CCCL's JIT compilation infrastructure from direct Numba APIs to numba-cuda-mlir. The changes include a new _mlir.py module providing the backend entry point, wrapper codegen refactored from Numba intrinsics to Python source generation, struct type system ported to MLIR, type conversions updated to use MLIR type infrastructure, and all op compilation paths switched to _mlir.cuda.compile.

Changes

CCCL JIT migration from Numba to numba-cuda-mlir

Layer / File(s) Summary
MLIR backend module and dtype helpers
python/cuda_cccl/cuda/compute/_jit.py, python/cuda_cccl/cuda/compute/_mlir.py
New _mlir.py module re-exports numba-cuda-mlir components (cuda, types, errors, numpy_support, signature); provides dtype converters (from_numpy_dtype, as_numpy_dtype) and compile_to_llvm_ir(pyfunc, sig, abi_name) that compiles via MLIR pipeline to optimized LLVM IR text. Imports updated in _jit.py to use _mlir backend.
Wrapper generation refactor to Python source
python/cuda_cccl/cuda/compute/_odr_helpers.py
Replaces Numba intrinsic/LLVM IR codegen with Python source generation; implements _make_wrapper_name for unique valid identifiers; reimplements create_op_void_ptr_wrapper and create_stateful_op_void_ptr_wrapper to build device wrappers via exec, with new API signature accepting state_dtypes instead of state_array_types and state_info.
Struct type system migration to MLIR
python/cuda_cccl/cuda/compute/_jit.py
_StructBase base class changed from numba.types.Type to _mlir.types.Type; struct field validation and can_convert_from use _mlir.types.UniTuple/_mlir.types.Tuple and _mlir.Conversion; attribute access lowering implemented via _mlir typing registry templates; struct getitem, construction, and tuple/struct casts added via _mlir.overload and _mlir.lower_cast using MLIR extractvalue/convert.
Type descriptor conversions using MLIR APIs
python/cuda_cccl/cuda/compute/_jit.py
type_descriptor_to_numba now passes through _mlir.types.Type and creates pointers via _mlir.types.CPointer; _convert_type_descriptor_to_numba uses _mlir.as_numba_type and _mlir.from_numpy_dtype; _ensure_function_structs_registered and _numba_type_to_type_descriptor updated to use _mlir APIs instead of Numba dtype converters.
LLVM bitcode generation and op compilation
python/cuda_cccl/cuda/compute/_jit.py
_compile_op_to_llvm_bitcode rewritten to obtain LLVM IR via _mlir.compile_to_llvm_ir, parse with llvmlite, emit optional debug artifacts, and return bitcode; _infer_return_type and _compile_op_impl switched to _mlir.cuda.compile with device=True, abi="c", output="ltoir".
Stateful op compilation with MLIR pointer types
python/cuda_cccl/cuda/compute/_jit.py
_compile_stateful_op derives state_dtypes from NumPy array dtypes and builds state_ptr_types as _mlir.types.CPointer; infers output type via _mlir.cuda.compile with pointer-typed state inputs; creates wrapper using updated create_stateful_op_void_ptr_wrapper(op, sig, state_dtypes) API, removing prior Numba Array type and state_info bookkeeping.
Dependency and mypy configuration
python/cuda_cccl/pyproject.toml
Adds numba-cuda-mlir[cu12/cu13]>=0.3.0 to cu12, cu13, sysctk12, sysctk13 optional-dependency extras; adds numba_cuda_mlir.* to mypy override module ignore list.
Test infrastructure for xfail handling
python/cuda_cccl/tests/conftest.py, python/cuda_cccl/tests/compute/conftest.py, python/cuda_cccl/tests/compute/test_void_ptr_wrapper_validation.py
Adds pytest hooks to mark compute example tests xfail based on known numba-cuda-mlir issues via _EXAMPLE_XFAILS mapping and _upstream_xfail_reason helper; updates wrapper validation tests to validate _make_wrapper_name and sanitize_identifier functions.

Possibly related issues

Suggested reviewers

  • Jacobfaib
  • davebayer
  • miscco

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
python/cuda_cccl/tests/conftest.py (1)

35-45: ⚡ Quick win

suggestion: Name extraction logic is duplicated in tests/compute/conftest.py line 238. Consider extracting getattr(item, "originalname", None) or item.name.split("[")[0] into a shared helper function to ensure consistency if the logic changes.

python/cuda_cccl/tests/compute/conftest.py (2)

136-229: ⚖️ Poor tradeoff

suggestion: The _upstream_xfail_reason function spans 94 lines with nested conditionals checking five distinct issues. Consider splitting into issue-specific helper functions (e.g., _check_issue_123, _check_issue_121) to improve maintainability and testability. Each helper could return the reason string or None.


238-238: ⚡ Quick win

suggestion: Name extraction logic getattr(item, "originalname", None) or item.name.split("[")[0] is duplicated from tests/conftest.py line 37. Consider consolidating into a shared helper to avoid drift.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c3c453b3-c3dd-4683-b958-f22ef9e34b75

📥 Commits

Reviewing files that changed from the base of the PR and between 071630f and 5dacf4a.

📒 Files selected for processing (7)
  • python/cuda_cccl/cuda/compute/_jit.py
  • python/cuda_cccl/cuda/compute/_mlir.py
  • python/cuda_cccl/cuda/compute/_odr_helpers.py
  • python/cuda_cccl/pyproject.toml
  • python/cuda_cccl/tests/compute/conftest.py
  • python/cuda_cccl/tests/compute/test_void_ptr_wrapper_validation.py
  • python/cuda_cccl/tests/conftest.py

Comment on lines 151 to 156
def can_convert_from(self, typingctx, other):
if isinstance(other, types.UniTuple):
if isinstance(other, _mlir.types.UniTuple):
tuple_size = other.count
if tuple_size == len(field_types):
return Conversion.safe
return _mlir.Conversion.safe

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

important: Validate UniTuple element convertibility before returning Conversion.safe.

This branch accepts any same-length UniTuple, even when its element type cannot convert to every target field type. That makes heterogeneous struct casts type-check here and fail later in the cast lowering, unlike the _mlir.types.Tuple branch which already does the per-field check.

Comment on lines +940 to 946
# State arrays are passed to the (transformed) op as typed pointers; the op
# body indexes them (``state[i]``), which works on a CPointer. See
# _odr_helpers.create_stateful_op_void_ptr_wrapper for how the packed state
# void* is unpacked into one CPointer per state array.
state_dtypes = [_mlir.from_numpy_dtype(get_dtype(s)) for s in state_arrays]
state_ptr_types = [_mlir.types.CPointer(dt) for dt in state_dtypes]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

important: Typing captured state arrays as CPointer silently drops array semantics.

The transformed operator body is unchanged, but each captured device array is now compiled as a bare pointer. Stateful operators that use .shape, multidimensional indexing, slicing, or any array attribute/method will stop typing/lowering because those operations are not available on CPointer. As per coding guidelines, python/cuda_cccl/**/*: Focus on Python API stability, CUDA array interoperability, memory ownership, JIT/NVRTC/nvJitLink behavior, package boundaries, user-defined operator correctness, tests, and examples.

Source: Coding guidelines

Comment on lines +955 to +963
_, return_type = _mlir.cuda.compile(
op,
all_numba_input_types,
device=True,
abi_info={"abi_name": abi_name},
output="ltoir",
)
# Convert return type to TypeDescriptor
output_type = cccl_types.from_numpy_dtype(
numba.np.numpy_support.as_dtype(return_type)
)
output_type = cccl_types.from_numpy_dtype(_mlir.as_numpy_dtype(return_type))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Stateful return-type inference no longer supports gpu_struct results.

This path converts return_type through _mlir.as_numpy_dtype, which only works for NumPy-backed scalar/POD types. The stateless path already uses _numba_type_to_type_descriptor; stateful operators returning a registered struct will now fail here or be reported as the wrong type.

Comment on lines +174 to +213
unique_state_dtypes = set(state_dtypes)
if len(unique_state_dtypes) > 1:
raise NotImplementedError(
"stateful operators that capture device arrays of differing dtypes "
f"are not supported (got {sorted(map(str, unique_state_dtypes))}); "
"all captured arrays must share a dtype"
)
state_dtype = state_dtypes[0]

op_device = cuda.jit(device=True)(op)

def create_input_dereference_void_ptr_wrapper(deref_fn, state_ptr_type, value_type):
"""Creates a wrapper function for input iterator dereference method.
# sig.args == (state_0, ..., state_{num_states-1}, input_0, ..., input_{K-1})
input_types = list(sig.args)[num_states:]
return_type = sig.return_type

The wrapper takes 2 void* arguments:
- state pointer
- result pointer (function writes result here)
"""
arg_specs = [
_ArgSpec(state_ptr_type, _ArgMode.PTR),
_ArgSpec(types.CPointer(value_type), _ArgMode.PTR),
]
inner_sig = types.void(state_ptr_type, types.CPointer(value_type))
return _create_void_ptr_wrapper(deref_fn, deref_fn.__name__, arg_specs, inner_sig)
wrapper_name = _make_wrapper_name(op.__name__)
input_names = [f"arg_{i}" for i in range(len(input_types))]

# states[j] reinterprets the j-th packed pointer as CPointer(state_dtype).
state_args = ", ".join(f"states[{j}]" for j in range(num_states))
input_args = ", ".join(f"{name}[0]" for name in input_names)
call_args = ", ".join(a for a in (state_args, input_args) if a)
reconstruct = _is_gpu_struct_type(return_type) and _op_returns_tuple(
op_device, sig.args
)
body, extra_namespace = _result_store_body(call_args, return_type, reconstruct)

wrapper_func = _build_wrapper(
wrapper_name,
["states", *input_names, "result"],
body,
op_device,
extra_namespace,
)

def create_output_dereference_void_ptr_wrapper(deref_fn, state_ptr_type, value_type):
"""Creates a wrapper function for output iterator dereference method.

The wrapper takes 2 void* arguments:
- state pointer
- value pointer (value to write)
"""
arg_specs = [
_ArgSpec(state_ptr_type, _ArgMode.PTR),
_ArgSpec(value_type, _ArgMode.LOAD),
]
inner_sig = types.void(state_ptr_type, value_type)
return _create_void_ptr_wrapper(deref_fn, deref_fn.__name__, arg_specs, inner_sig)
wrapper_sig = types.void(
types.CPointer(types.CPointer(state_dtype)),
*(types.CPointer(t) for t in input_types),
types.CPointer(return_type),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

important: Heterogeneous captured state arrays become unsupported in this wrapper path. _jit._compile_stateful_op still builds one pointer type per captured array, but this code collapses the packed states blob to CPointer(CPointer(state_dtype)) and raises on mixed state_dtypes. That makes previously valid stateful ops fail as soon as they capture arrays with different element types. Preserve per-slot pointer reconstruction here instead of requiring a single shared dtype, and add a mixed-dtype captured-state regression test once this is fixed. As per coding guidelines, "Focus on Python API stability, CUDA array interoperability, memory ownership, JIT/NVRTC/nvJitLink behavior, package boundaries, user-defined operator correctness, tests, and examples."

Source: Coding guidelines

Comment on lines 60 to 89
cu12 = [
"cuda-cccl[minimal-cu12]",
# numba / numba-cuda: used by cuda.coop (Numba-CUDA cooperative primitives).
"numba>=0.60.0",
"numba-cuda[cu12]>=0.23.0,!=0.27.*,!=0.28.*,!=0.29.*,!=0.30.0",
# numba-cuda-mlir: backend that JIT-compiles cuda.compute user operators and
# gpu_struct types (the MLIR-based successor to numba-cuda).
"numba-cuda-mlir[cu12]>=0.3.0",
]
cu13 = [
"cuda-cccl[minimal-cu13]",
# numba / numba-cuda: used by cuda.coop (Numba-CUDA cooperative primitives).
"numba>=0.60.0",
"numba-cuda[cu13]>=0.23.0,!=0.27.*,!=0.28.*,!=0.29.*,!=0.30.0"
"numba-cuda[cu13]>=0.23.0,!=0.27.*,!=0.28.*,!=0.29.*,!=0.30.0",
# numba-cuda-mlir: backend that JIT-compiles cuda.compute user operators and
# gpu_struct types (the MLIR-based successor to numba-cuda).
"numba-cuda-mlir[cu13]>=0.3.0",
]
sysctk12 = [
"cuda-cccl[minimal-sysctk12]",
"numba>=0.60.0",
"numba-cuda[cu12]>=0.23.0,!=0.27.*,!=0.28.*,!=0.29.*,!=0.30.0"
"numba-cuda[cu12]>=0.23.0,!=0.27.*,!=0.28.*,!=0.29.*,!=0.30.0",
"numba-cuda-mlir[cu12]>=0.3.0",
]
sysctk13 = [
"cuda-cccl[minimal-sysctk13]",
"numba>=0.60.0",
"numba-cuda[cu13]>=0.23.0,!=0.27.*,!=0.28.*,!=0.29.*,!=0.30.0",
"numba-cuda-mlir[cu13]>=0.3.0",
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify numba-cuda-mlir package, version, and extras exist on PyPI

echo "=== Checking numba-cuda-mlir on PyPI ==="
curl -s https://pypi.org/pypi/numba-cuda-mlir/json | jq -r '
  "Latest version: " + .info.version,
  "Available versions: " + ([.releases | keys[] | select(. >= "0.3.0")] | join(", ")),
  ""
'

echo "=== Checking for extras in version 0.3.0 ==="
curl -s https://pypi.org/pypi/numba-cuda-mlir/0.3.0/json | jq -r '
  .releases["0.3.0"][] |
  select(.packagetype == "bdist_wheel") |
  .filename
' | head -5

echo ""
echo "=== Checking for cu12/cu13 in package metadata ==="
curl -s https://pypi.org/pypi/numba-cuda-mlir/0.3.0/json | jq -r '
  .info.requires_dist // [] | 
  map(select(contains("cu12") or contains("cu13"))) |
  .[]
' | head -10

Repository: NVIDIA/cccl

Length of output: 597


Add an upper bound for numba-cuda-mlir in cu12/cu13 extras
python/cuda_cccl/pyproject.toml currently specifies numba-cuda-mlir[cu12/cu13] >= 0.3.0 without an upper cap; add an upper bound (e.g., <0.4.0) or pin to ==0.3.0 to avoid future breaking dependency changes.

@github-actions

Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 46m 07s: Pass: 29%/51 | Total: 3h 48m | Max: 36m 34s

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant