Bump tvm ffi version #1939

cyx-6 · 2025-10-16T20:34:14Z

📌 Description

This PR bumps the tvm-ffi version and update the flashinfer code base.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Bug Fixes
- Fixed TGVGemmRunner output orientation to return results in expected shape without unnecessary transposition.
Chores
- Updated Apache TVM FFI dependency version range from exact pre-release pin to flexible minor version constraint for broader compatibility.

## 📌 Description This PR fixes the dev container after flashinfer-ai#1880.  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes

…-ffi

gemini-code-assist · 2025-10-16T20:35:46Z

Summary of Changes

Hello @cyx-6, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on adapting the FlashInfer codebase to a newer version of the tvm-ffi library. This involves a significant refactoring of how TensorView objects are accessed in C++ CUDA kernels, moving from direct member access to more robust method-based interactions. Additionally, the Python binding for a GEMM operation was slightly adjusted for correctness.

Highlights

TVM FFI Version Bump: The apache-tvm-ffi dependency has been updated from version 0.1.0b15 to 0.1.0b19 across the project's build configurations and requirements.
TensorView API Modernization: The FlashInfer codebase has been updated to reflect changes in the tvm-ffi API, transitioning from direct member access like ->shape[idx] and ->data to method calls such as .size(idx) and .data_ptr() for TensorView objects. This change is pervasive across numerous CUDA C++ source files.
Gemm Output Transposition Adjustment: The tgv_gemm function in flashinfer/gemm.py was modified to return the output tensor directly, removing an unnecessary transposition (.t()) operation.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/nightly-release.yml
- .github/workflows/release.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request primarily bumps the tvm-ffi version and updates the codebase to align with the new API. The changes are mostly mechanical, such as replacing -> with . for member access and updating method calls like shape to size and data to data_ptr(). The changes are largely correct and consistent. However, I've identified one critical issue in csrc/xqa/utils.cuh where a change was incorrectly applied to a custom struct, which will likely cause a compilation failure. Please see the specific comment for details.

csrc/xqa/utils.cuh

yzh119 · 2025-10-18T07:16:39Z

Let's defer it till the stable release of tvm-ffi

…-ffi

yzh119 · 2025-10-21T04:49:51Z

@cyx-6 would you mind bumping to v0.1.0 stable release?

coderabbitai · 2025-10-21T05:12:32Z

Caution

Review failed

Failed to post review comments

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

A large-scale refactor updates tensor access patterns across CUDA and Python code from pointer-based field syntax (→data, →shape) to modern accessor methods (data_ptr(), size()). Additionally, apache-tvm-ffi dependency constraint is relaxed from fixed pre-release (0.1.0b15) to flexible version range (≥0.1, <0.2).

Changes

Cohort / File(s)	Summary
Dependency Version Constraints `.github/workflows/nightly-release.yml`, `.github/workflows/release.yml`, `pyproject.toml`, `requirements.txt`, `flashinfer-cubin/pyproject.toml`, `flashinfer-jit-cache/pyproject.toml`	Updated apache-tvm-ffi from exact pre-release pin `==0.1.0b15` to semantic range `>=0.1,<0.2`, relaxing version constraints.
CUDA Attention & Decode Kernels `csrc/batch_attention.cu`, `csrc/batch_decode.cu`, `csrc/batch_decode_mla_*.cu`, `csrc/single_decode.cu`	Replaced pointer-based tensor access (`→data`, `→shape[i]`, `→device`) with accessor methods (`data_ptr()`, `size(i)`, `device()`). Updated stream retrieval and device setup.
CUDA Prefill & MLA Kernels `csrc/batch_prefill.cu`, `csrc/batch_mla.cu`, `csrc/single_prefill*.cu`	Migrated tensor field access from pointer syntax to value-based accessors (size(), stride(), data_ptr()), updated device/stream handling, and adjusted parameter passing to kernel invocations.
CUDA GEMM & Matrix Operations `csrc/gemm_groupwise_sm.cu`, `csrc/group_gemm.cu`, `csrc/tgv_gemm.cu`, `csrc/bmm_fp8.cu`, `csrc/fp4_gemm_cutlass*.cu`, `csrc/fp8_gemm_cutlass.cu`	Replaced shape indexing with size() calls, device field access with device() methods, and raw data pointers with data_ptr() throughout matrix operation kernels.
CUDA Utility Kernels `csrc/norm.cu`, `csrc/rope.cu`, `csrc/sampling.cu`, `csrc/renorm.cu`, `csrc/quantization.cu`, `csrc/cascade.cu`, `csrc/page.cu`, `csrc/pod.cu`	Updated tensor accessors across normalization, rotation encoding, sampling, and utility kernels to use size(), stride(), data_ptr(), and device() instead of raw pointer fields.
CUDA Kernel Launchers (FMHA/SDPA) `csrc/blackwell_fmha_plan.cu`, `csrc/cudnn_sdpa_kernel_launcher.cu`, `csrc/fmha_cutlass_sm100.cu`	Migrated pointer-based tensor access to accessor methods for tensor metadata, device setup, and kernel parameter passing.
CUDA MoE & Fusion Operations `csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu`, `csrc/trtllm_fused_moe_kernel_launcher.cu`	Updated MoE kernel launchers to use size(), dtype(), data_ptr(), and device() accessors for input validation and kernel invocation.
TensorRT-LLM CUDA Bindings `csrc/trtllm_.cu`, `csrc/nv_internal/tensorrt_llm/thop/.cpp`	Replaced pointer-based tensor field access with value-based accessors across allreduce, FMHA, GEMM runners, and quantization operations for TensorRT-LLM integration.
CUDA XQA Kernels `csrc/xqa/xqa_wrapper.cu`, `csrc/cutlass_mla.cu`	Updated tensor data and device access to use data_ptr() and device() instead of pointer fields in XQA attention kernels.
TVM FFI Utilities & Headers `csrc/tvm_ffi_utils.h`, `csrc/batch_mla_config.jinja`	Updated accessor macros and template code to use public tensor API (ndim(), size(i), stride(), dtype(), data_ptr(), device()) instead of direct member access.
CUDA NVSHMEM & vLLM Bindings `csrc/nvshmem_binding.cu`, `csrc/vllm_custom_all_reduce.cu`	Migrated tensor access in communication and allreduce kernels from pointer members to accessor methods.
Python Tensor Access `flashinfer/jit/attention/utils.py`, `flashinfer/jit/activation.py`	Updated generated code templates and Python wrapper code to use data_ptr() and size() accessors for tensor data retrieval.
Functional Change `flashinfer/gemm.py`	Modified `TGVGemmRunner.forward()` to return output tensor `c` directly instead of `c.t()`, changing output orientation from transposed to native.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Rationale: While the diff touches 70+ files, the changes are highly repetitive and follow consistent patterns: systematic replacement of pointer-based tensor access (→shape[i], →data, →device) with accessor methods (size(i), data_ptr(), device()). The uniformity significantly reduces cognitive load—each reviewer can validate the pattern once and apply it across sections. However, the CUDA-heavy nature demands careful attention to pointer semantics, correct accessor usage, and proper device/stream management. The presence of optional value handling (value().data_ptr()) and occasional stride calculations (stride(n)) introduces minor complexity. The single functional change in gemm.py and dependency updates are straightforward.

Poem

🐰 Hopping through pointers with glee,
We've migrated to .data_ptr()!
No more →shape[0], just size(0) please,
These accessors make refactors a breeze. ✨
From 0.1.0b15 to ranges so wide,
Our tensor APIs now have modern pride. 🚀

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title "Bump tvm ffi version" is specific and directly references a core aspect of the changeset—the apache-tvm-ffi dependency update from ==0.1.0b15 to >=0.1,<0.2. However, the changeset encompasses substantially more than version bumping. The overwhelming majority of changes involve systematic refactoring of CUDA and Python code to migrate from pointer-based tensor field access (→data, →shape, →device) to modern accessor APIs (data_ptr(), size(), device(), stride(), dtype()). The title captures the dependency change but obscures the primary implementation work required for API compatibility.
Description Check	⚠️ Warning	The PR description is largely incomplete against the provided template. While the 📌 Description section is present, it contains only a single minimal sentence ("This PR bumps the tvm-ffi version and update the flashinfer code base.") without explaining what specific changes were made, why they're necessary, or their impact. The 🔍 Related Issues section is entirely empty, despite PR comments from yzh119 requesting a bump to v0.1.0 stable release, which appears to be relevant context. All items in the 🚀 Pull Request Checklist are unchecked, indicating pre-commit hooks may not have been run and tests may not have been verified.	Expand the Description section to detail the scope of changes (e.g., mention the widespread tensor accessor API migration), link any related issues from the PR discussion (particularly the stable release request), and verify the pre-commit and test checklist items before merging. A more complete description would help reviewers quickly understand both the dependency update and the extensive codebase refactoring required.
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.04% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cyx-6 · 2025-10-21T05:57:31Z

closed for #1960

## 📌 Description This PR bumps the tvm-ffi to stable version 0.1.0 and update the flashinfer code base.  ## 🔍 Related Issues #1939 ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [ ] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [ ] I have installed the hooks with `pre-commit install`. - [ ] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Chores** * Relaxed build dependency pins for apache-tvm-ffi and setuptools across project configs; removed installation of multiple build packages from the nightly CI step. * **Refactor** * Modernized internal CUDA/tensor access patterns to a consistent accessor API across many modules. * **Bug Fixes** * GEMM runner now returns the output tensor in the correct (non‑transposed) orientation.  --------- Co-authored-by: Zihao Ye <[email protected]> Co-authored-by: yzh119 <[email protected]>

cyx-6 added 7 commits October 16, 2025 20:18

fix

eccef49

upd

a6cf387

bump

d23ef1f

Merge commit 'bd98dacf67b78d81eed48fab2592b2bb48d51bcd' into bump-tvm…

9c3b03f

…-ffi

b19

13f87f0

fix

f2519ae

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

csrc/xqa/utils.cuh Outdated Show resolved Hide resolved

fix

26511ec

Merge commit 'c3f2596c1464f03f24b129caec692b7ba2d7e9a9' into bump-tvm…

29563c4

…-ffi

upd

2343f25

cyx-6 requested review from Anerudhan, aleozlx, bkryu, joker-eph, kahyunnam, nvmbreughe, wenscarl, yongwww and yzh119 as code owners October 21, 2025 05:12

cyx-6 closed this Oct 21, 2025

This was referenced Oct 21, 2025

Inquiry for upgrading apache-tvm-ffi dependency version #1962

Closed

Bump tvm ffi to stable version 0.1.0 #1960

Merged

mgoin mentioned this pull request Oct 24, 2025

[Installation]: FlashInfer Dependency issue due to pre-release apache-tvm-ffi vllm-project/vllm#27476

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump tvm ffi version #1939

Bump tvm ffi version #1939

Uh oh!

cyx-6 commented Oct 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

yzh119 commented Oct 18, 2025

Uh oh!

yzh119 commented Oct 21, 2025

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Review failed

Other AI code review bot(s) detected

Uh oh!

cyx-6 commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bump tvm ffi version #1939

Bump tvm ffi version #1939

Uh oh!

Conversation

cyx-6 commented Oct 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yzh119 commented Oct 18, 2025

Uh oh!

yzh119 commented Oct 21, 2025

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

cyx-6 commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cyx-6 commented Oct 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading