feat: Enable API Logging for Better Debugging POC #2108

bkryu · 2025-11-18T21:52:47Z

📌 Description

WIP Do not merge

tl; dr: Current PR adds a logging system for input/output tracking to aid debugging FlashInfer APIs.

This PR introduces a production-ready API logging infrastructure that tracks function calls, arguments, and return values via a simple one-line decorator. Any function can be decorated with the decorator to track the input/output values in the API logger.

Key Features:

Logging level controlled by FLASHINFER_LOGLEVEL_DBG
Log destination set by FLASHINFER_LOGDEST_DBG; defaults to stdout
Zero overhead when disabled (level 0 returns original function) as seen from benchmarks/bench_logging_overhead.py

Example usage

export FLASHINFER_LOGLEVEL_DBG=1
export FLASHINFER_LOGDEST_DBG="./flashinfer_api.log"

python3 benchmarks/flashinfer_benchmark.py --routine BatchDecodeWithPagedKVCacheWrapper --backends fa2 fa2_tc cudnn trtllm-gen trtllm-gen-native --page_size 16 --batch_size 1 --s_qo 1 --s_kv 1024 --num_qo_heads 64 --num_kv_heads 8 --head_dim_qk 128 --head_dim_vo 128 --random_actual_seq_len -vv --refcheck --q_dtype bfloat16 --kv_dtype bfloat16

produces log

================================================================================
[2025-11-20 17:51:18] FlashInfer API Logging - System Information
================================================================================
FlashInfer version: 0.5.2
CUDA toolkit version: 13.0
cuDNN version: 91600
Number of GPUs: 1
  GPU 0: NVIDIA B200
    Compute capability: 10.0 (SM100)
PyTorch version: 2.9.0+cu130
================================================================================

[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.plan
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.plan
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.plan
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.run
[2025-11-20 17:51:19] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.run
...

export FLASHINFER_LOGLEVEL_DBG=3 produces:

(System Info same as above)
================================================================================
[2025-11-20 17:51:58] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
--------------------------------------------------------------------------------
Positional input arguments:
  arg[0]:
    <flashinfer.decode.BatchDecodeWithPagedKVCacheWrapper object at 0x1234399e3410>
  arg[1]:
    Tensor(
      shape=(134217728,)
      stride=(1,)
      dtype=torch.int8
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  arg[2]:
    'HND'
Keyword input arguments:
  use_cuda_graph=
    True
  use_tensor_cores=
    False
  paged_kv_indptr_buffer=
    Tensor(
      shape=(2,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  paged_kv_indices_buffer=
    Tensor(
      shape=(6,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  paged_kv_last_page_len_buffer=
    Tensor(
      shape=(1,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
    )
  backend=
    'fa2'
Default parameters (not explicitly provided):
  jit_args= [DEFAULT]
    None
Output value:
  None
================================================================================
...

export FLASHINFER_LOGLEVEL_DBG=5 produces:

(System Info same as above)
================================================================================
[2025-11-20 17:52:23] FlashInfer API Call: BatchDecodeWithPagedKVCacheWrapper.__init__
--------------------------------------------------------------------------------
Positional input arguments:
  arg[0]:
    <flashinfer.decode.BatchDecodeWithPagedKVCacheWrapper object at 0x7a9fd9a88c0>
  arg[1]:
    Tensor(
      shape=(134217728,)
      stride=(1,)
      dtype=torch.int8
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=0
      max=0
      mean=0.000000
    )
  arg[2]:
    'HND'
Keyword input arguments:
  use_cuda_graph=
    True
  use_tensor_cores=
    False
  paged_kv_indptr_buffer=
    Tensor(
      shape=(2,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=0
      max=6
      mean=3.000000
    )
  paged_kv_indices_buffer=
    Tensor(
      shape=(6,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=0
      max=5
      mean=2.500000
    )
  paged_kv_last_page_len_buffer=
    Tensor(
      shape=(1,)
      stride=(1,)
      dtype=torch.int32
      device=cuda:0
      requires_grad=False
      is_contiguous=True
      min=4
      max=4
      mean=4.000000
    )
  backend=
    'fa2'
Default parameters (not explicitly provided):
  jit_args= [DEFAULT]
    None
Output value:
  None
================================================================================
...

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Adds a configurable, crash-safe API logging facility with level-based verbosity (including optional tensor statistics) and selectable destinations; zero overhead when disabled.
- Instrumentation added across many public API entry points to enable call tracing.
Tools
- New CUDA-aware benchmark to measure logging overhead, report per-iteration statistics, and offer recommendations.
Tests
- Expanded test suite covering logging levels, data types, crash-safety, CUDA-graph behavior, and file-output handling.
Documentation
- New detailed logging guide and README additions explaining configuration and usage.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

gemini-code-assist · 2025-11-18T21:53:14Z

Summary of Changes

Hello @bkryu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and flexible API logging system designed to significantly enhance debugging capabilities within FlashInfer. The system allows developers to easily instrument functions with a decorator to log calls, arguments, and return values at various levels of detail, from simple function names to comprehensive tensor metadata and statistics. A key design principle is crash-safety, ensuring that input arguments are logged even if a function terminates unexpectedly, which is crucial for diagnosing complex issues like CUDA kernel crashes. Furthermore, the logging system is optimized for performance, offering zero overhead when disabled, and intelligently handles CUDA graph capture to avoid performance bottlenecks. This infrastructure will be invaluable for understanding MoE workloads and other performance-critical operations.

Highlights

Introduced API Logging System: A new, production-ready API logging infrastructure has been added to FlashInfer to track function calls, arguments, and return values.
Configurable Logging Levels: The system supports multiple logging levels (0-3) controlled by the environment variable FLASHINFER_APILOG_LEVEL, allowing for flexible verbosity from function names only to detailed tensor statistics.
Zero Overhead When Disabled: When FLASHINFER_APILOG_LEVEL is set to 0, the decorator returns the original function, ensuring no performance impact.
Crash-Safe Input Logging: Function inputs are logged before execution, preserving critical debugging information even if the function crashes (e.g., due to CUDA errors).
CUDA Graph Compatibility: Tensor statistics logging (level 3) intelligently skips during CUDA graph capture to prevent synchronization issues, logging a message instead.
Extensive Integration: The logging decorator has been applied to numerous functions and methods across cudnn, decode, fused_moe, gemm, mla, and prefill modules.
New Benchmarking Tool: A bench_logging_overhead.py script was added to measure and analyze the performance impact of different logging levels.
Comprehensive Test Coverage: A new test file test_logging.py ensures the robustness and correctness of the API logging system, covering various data types, default parameters, and crash scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-11-18T21:53:57Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a crash-safe, environment-configurable API logging decorator (flashinfer_log), instruments many public FlashInfer APIs with it, introduces a CUDA matmul benchmark measuring logging overhead, adds tests validating logging behavior and levels, and documents the feature.

Changes

Cohort / File(s)	Summary
API logging module `flashinfer/api_logging.py`	New module providing `flashinfer_log` decorator, env-driven log level/destination, crash-safe pre/post-call logging, value formatting, system-info header, CUDA-graph-aware behavior, and logger initialization.
Benchmark script `benchmarks/bench_logging_overhead.py`	New CUDA matmul benchmark that compares undecorated vs `@flashinfer_log`-decorated runs, with warmup, timed iterations, aggregated stats, relative overhead reporting, and optional logfile handling.
Decode & cuDNN instrumentation `flashinfer/decode.py`, `flashinfer/cudnn/decode.py`	Imported `flashinfer_log` and applied `@flashinfer_log` to many public decode entry points and wrapper methods (single/batch, TRT/XQA, cuDNN) without changing signatures.
Prefill & cuDNN prefill instrumentation `flashinfer/prefill.py`, `flashinfer/cudnn/prefill.py`	Imported `flashinfer_log` and applied `@flashinfer_log` to top-level and wrapper prefill constructors/plan/run functions; one prefill signature extended with `is_cuda_graph_compatible` and `backend`.
MOE instrumentation `flashinfer/fused_moe/core.py`	Imported `flashinfer_log` and applied `@flashinfer_log` to public MOE entry points (cutlass_fused_moe and various TRT/FP8/FP4 MoE variants).
GEMM instrumentation `flashinfer/gemm/gemm_base.py`	Imported `flashinfer_log`, applied `@flashinfer_log` across many GEMM entry points (mm_fp8, mm_fp4, bmm_fp8, groupwise/NT variants), added `tgv_gemm_sm100` wrapper and decorated `SegmentGEMMWrapper.run`.
MLA instrumentation `flashinfer/mla.py`	Imported `flashinfer_log` and applied `@flashinfer_log` to `BatchMLAPagedAttentionWrapper` methods (`__init__`, `plan`, `run`).
Tests `tests/utils/test_logging.py`	New comprehensive test suite covering log levels (0,1,3,5), enums/tensors/defaults/class methods/CUDA graph compatibility, crash-safety, and file-based assertions.
Docs / README `LOGGING.md`, `README.md`	New `LOGGING.md` describing env vars, levels, PID substitution and CUDA-graph notes; README updated to reference API logging and LOGGING.md.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller as User code
    participant Decorator as `@flashinfer_log`
    participant Logger as flashinfer.api logger
    participant Impl as API implementation

    Note over Decorator,Logger: Level & destination read from env at import
    alt Level = 0 (disabled)
        Caller->>Decorator: call(args, kwargs)
        Decorator->>Impl: direct call (no logging)
        Impl-->>Decorator: result
        Decorator-->>Caller: result
    else Level >= 1 (enabled)
        Caller->>Decorator: call(args, kwargs)
        Decorator->>Logger: _log_function_inputs (name, args, defaults)
        Logger-->>Logger: format per level (1..5)
        Decorator->>Impl: execute API (protected from crashes/CUDA graph issues)
        Impl-->>Decorator: result
        Decorator->>Logger: _log_function_outputs (result)
        Logger-->>Decorator: persist (file / stdout / stderr)
        Decorator-->>Caller: result
    end
    Note over Logger: Level 5 includes tensor stats (skipped during CUDA-graph capture)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Attention points:
- flashinfer/api_logging.py: crash-safety, CUDA-graph detection, env var parsing, formatting edge cases.
- Widespread decorator application: verify no signature/behavior regressions and correct handling of bound methods.
- Prefill signature change: is_cuda_graph_compatible and backend additions need compatibility checks.
- Tests/benchmark: ensure CUDA sync/warmup correctness and reliable teardown of temporary log files.

Possibly related PRs

feat: autotune tile_tokens_dim in trtllm-gen MOE #1980 — Touches MOE entry points that this PR decorates; potential overlap/conflicts in same functions.

Suggested reviewers

aleozlx
djmmoss
cyx-6
wenscarl
nvmbreughe
yzh119

Poem

🐇 I hopped through patches, ears alert and bright,

I stitched a quiet logger into each API's light.
Benchmarks counted hops and tests kept score,
I logged the calls, then bounded off once more.
A tiny rabbit witness in the dev-night.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.75% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding API logging infrastructure as a POC for debugging purposes, which is reflected throughout the changeset.
Description check	✅ Passed	The PR description is comprehensive and covers the logging system's purpose, key features, configuration methods, logging levels, usage examples, and expected outputs. All required sections are present and well-documented.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This PR introduces a comprehensive and well-designed API logging system for FlashInfer. The use of a decorator (@flashinfer_api_log) makes it easy to apply, and the control via environment variables (FLASHINFER_APILOG_LEVEL, FLASHINFER_APILOG_DEST) is flexible. The zero-overhead design when disabled is a key feature and is implemented correctly. The different logging levels provide a good range of verbosity for debugging, and the crash-safe logging of inputs is a great feature for diagnosing kernel crashes. The inclusion of a benchmark script and a thorough test suite is excellent.

My review focuses on improving the robustness and maintainability of the new logging module. I've identified a few areas for improvement, such as making exception handling more specific and simplifying some conditional logic.

Overall, this is a great addition to the library that will significantly improve debuggability.

gemini-code-assist · 2025-11-18T21:54:52Z

flashinfer/api_logging.py

+
+
+# Read environment variables once at module load time
+_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))


The int() conversion for FLASHINFER_APILOG_LEVEL is not robust against invalid user input. If a user sets this environment variable to a non-integer string (e.g., "foo"), it will raise a ValueError and prevent the module from being imported. It would be more robust to handle this case gracefully, for example by defaulting to level 0 and perhaps logging a warning.

Suggested change

_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))

try:

_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))

except (ValueError, TypeError):

_API_LOG_LEVEL = 0

gemini-code-assist · 2025-11-18T21:54:52Z

flashinfer/api_logging.py

+    except Exception:
+        # If we can't inspect the signature, return empty dict
+        return {}


The except Exception: block is too broad. It will catch any exception, including system-exiting exceptions like KeyboardInterrupt, which can hide bugs and make debugging difficult. The inspect.signature() function is documented to raise ValueError or TypeError on failure. It's better to catch only these specific exceptions.

Suggested change

except Exception:

# If we can't inspect the signature, return empty dict

return {}

except (ValueError, TypeError):

# If we can't inspect the signature, return empty dict

return {}

gemini-code-assist · 2025-11-18T21:54:52Z

flashinfer/api_logging.py

+                    if "Wrapper" in class_name or class_name in [
+                        "BatchMLAPagedAttentionWrapper"
+                    ]:


The condition class_name in ["BatchMLAPagedAttentionWrapper"] is redundant because the class name "BatchMLAPagedAttentionWrapper" already contains the substring "Wrapper", which is checked in the first part of the or condition. This can be simplified for better readability and maintainability.

if "Wrapper" in class_name:

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

tests/utils/test_logging.py (1)

227-247: Consider quieting Ruff’s unused‑argument warnings in tests

Several test helpers intentionally don’t use all parameters (e.g., mode in test_default_parameters, arguments to crashing_function, and the many typed parameters in test_different_data_types). If Ruff is run on tests, you may want to silence these via _-prefixed names or # noqa: ARG001 on those definitions to keep the suite warning‑free; behavior is otherwise fine.

Also applies to: 322-344, 355-383

flashinfer/fused_moe/core.py (1)

23-23: MoE public API logging is wired correctly

Wrapping the high‑level MoE entrypoints (cutlass_fused_moe, trtllm_bf16_moe, all FP8/FP4 variants) with flashinfer_api_log gives good coverage of the performance‑critical surface without touching the custom‑op registration or autotuning internals. Signatures and control flow are preserved. Just be aware that at logging level 3, tensor statistics over large MoE activations/weights will be non‑trivial in cost, so that mode should remain a targeted debugging tool rather than a default.

Also applies to: 688-906, 1862-1941, 1943-2015, 2017-2093, 2095-2223, 2225-2354
flashinfer/api_logging.py (2)
28-68: Env‑driven logger setup matches goals; consider a tiny robustness tweak

Reading FLASHINFER_APILOG_LEVEL / FLASHINFER_APILOG_DEST once at import and configuring a dedicated flashinfer.api logger (with a NullHandler at level 0) cleanly achieves “zero‑overhead when disabled” and isolates logs from the root logger. One minor robustness improvement would be to guard the int() conversion for _API_LOG_LEVEL so a malformed env var falls back to level 0 (or a safe default) instead of raising at import.

345-367: Use the func_name argument in _log_function_outputs for clearer logs

_log_function_outputs accepts func_name but currently ignores it, which also triggers a linter warning. You could make the logs more self‑describing and fix the unused parameter by adding a small header including the function name. For example:
 def _log_function_outputs(func_name: str, result: Any, level: int) -> None:
@@
-    lines = []
-    # Log outputs
-    lines.append("Output value:")
+    lines = []
+    lines.append(f"FlashInfer API Return: {func_name}")
+    lines.append("Output value:")
     lines.append(_format_value(result, level, indent=1))
@@
-    _logger.debug("\n".join(lines))
+    _logger.debug("\n".join(lines))
This preserves existing tests (which only assert on "Output value:") while improving traceability.
benchmarks/bench_logging_overhead.py (1)
35-38: Align benchmark log destination with api_logging’s default to avoid confusion

The script defaults LOG_DEST to /tmp/flashinfer_benchmark_log.txt, but flashinfer.api_logging defaults FLASHINFER_APILOG_DEST to ./flashinfer_log.txt when the env var is unset. If users only set FLASHINFER_APILOG_LEVEL (as shown in the usage block), the benchmark will print and clean up /tmp/... while the actual logs go to ./flashinfer_log.txt, so log‑size reporting and cleanup can silently miss the real file.

To make behavior predictable, consider either:

Matching the default with the library:
-LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "/tmp/flashinfer_benchmark_log.txt")
+LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "./flashinfer_log.txt")
and/or

Explicitly propagating LOG_DEST into the environment before importing flashinfer.api_logging when the env var is not already set.

Either option will keep the benchmark’s “LOG FILE INFO” and cleanup in sync with where the decorator actually writes.

Also applies to: 265-327
flashinfer/gemm/gemm_base.py (3)
1842-2002: Decorator stacking on mm_fp4 means logging happens after backend/shape checks

mm_fp4 is now decorated as:
@backend_requirement(...)
@flashinfer_api_log
def mm_fp4(...):
    ...
This order implies that backend and problem-size checks (and any heuristic backend selection) run first, and only if they pass does the call enter the logging wrapper and function body. That’s a reasonable choice and doesn’t affect correctness, but it does mean invalid/problematic calls rejected by backend_requirement won’t appear in API logs.

If you’d prefer to log all attempted API calls—including ones failing backend requirements—you might want to reverse the order:
@flashinfer_api_log
@backend_requirement(...)
def mm_fp4(...):
    ...
Please double‑check the implementation of backend_requirement to ensure it doesn’t rely on attributes on the original function that could be hidden by another wrapper (though functools.wraps in the logger should preserve most metadata).

2100-2189: bmm_fp8 logging is consistent; same decorator-order caveat as mm_fp4

bmm_fp8 is also defined with:
@backend_requirement(...)
@flashinfer_api_log
def bmm_fp8(...):
    ...
So backends/requirements are checked before logging, and only successful calls will be logged. The attributes added by backend_requirement (e.g., suitable_auto_backends) still attach to the outer wrapper, so the internal use of bmm_fp8.suitable_auto_backends remains valid.

If you want logs for calls that are rejected due to unsupported compute capability or shapes, consider swapping decorator order as suggested for mm_fp4, and verify that backend_requirement remains compatible with being under the logging wrapper.

2501-2527: Potential double-logging when calling gemm_fp8_nt_blockscaled

gemm_fp8_nt_blockscaled is a thin wrapper around gemm_fp8_nt_groupwise, and both are decorated with @flashinfer_api_log. A call to gemm_fp8_nt_blockscaled will therefore emit two log entries: one for gemm_fp8_nt_blockscaled and one for the inner gemm_fp8_nt_groupwise call.

This might be desirable (showing both the high-level alias and the underlying primitive), but could also add noise to logs.

If you want only a single log entry per API call here, you could either:

Drop the decorator from gemm_fp8_nt_blockscaled, relying on the inner gemm_fp8_nt_groupwise logging, or

Keep only the alias decorated and undeco gemm_fp8_nt_groupwise if it’s considered an internal helper.

Please choose based on how you expect users to call these APIs directly versus via wrappers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c4b522 and 92c15f7.

📒 Files selected for processing (10)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
flashinfer/cudnn/decode.py (2 hunks)
flashinfer/cudnn/prefill.py (2 hunks)
flashinfer/decode.py (10 hunks)
flashinfer/fused_moe/core.py (7 hunks)
flashinfer/gemm/gemm_base.py (12 hunks)
flashinfer/mla.py (4 hunks)
flashinfer/prefill.py (11 hunks)
tests/utils/test_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (9)

flashinfer/mla.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_api_log (369-462)

decorator (418-457)

wrapper (420-455)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/fused_moe/core.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/cudnn/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/cudnn/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

flashinfer/gemm/gemm_base.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (369-462)

🪛 Ruff (0.14.5)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

346-346: Do not catch blind exception: Exception

(BLE001)

flashinfer/api_logging.py

161-161: Do not catch blind exception: Exception

(BLE001)

225-225: Use explicit conversion flag

Replace with conversion flag

(RUF010)

236-236: Use explicit conversion flag

Replace with conversion flag

(RUF010)

240-240: Use explicit conversion flag

Replace with conversion flag

(RUF010)

241-241: Do not catch blind exception: Exception

(BLE001)

284-284: Consider moving this statement to an else block

(TRY300)

285-285: Do not catch blind exception: Exception

(BLE001)

345-345: Unused function argument: func_name

(ARG001)

369-369: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

430-431: try-except-pass detected, consider logging the exception

(S110)

430-430: Do not catch blind exception: Exception

(BLE001)

441-441: Do not catch blind exception: Exception

(BLE001)

442-442: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

452-452: Do not catch blind exception: Exception

(BLE001)

453-453: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (36)

flashinfer/cudnn/prefill.py (1)

6-6: Decorator wiring for cuDNN prefill entrypoint looks correct

Importing flashinfer_api_log and wrapping cudnn_batch_prefill_with_kv_cache is consistent with the rest of the API surface; it doesn’t touch the cudnn-graph/jit internals and preserves function signature and control flow. Logging remains opt‑in via env level.

Also applies to: 387-543

flashinfer/mla.py (1)

22-22: MLA wrapper instrumentation is aligned with the logging design

Decorating BatchMLAPagedAttentionWrapper.__init__, plan, and run with flashinfer_api_log cleanly instruments the main public lifecycle without altering behavior. The class name pattern ensures logs are emitted as BatchMLAPagedAttentionWrapper.<method>, which is useful for traceability, and zero‑overhead semantics hold when logging is disabled at import time.

Also applies to: 133-203, 204-305, 337-451

flashinfer/cudnn/decode.py (1)

6-6: cuDNN decode entrypoint is correctly wrapped for logging

The flashinfer_api_log import and decorator on cudnn_batch_decode_with_kv_cache match the prefill pattern, leave the cuDNN graph construction untouched, and preserve the API contract. This is a safe, minimal instrumentation point.

Also applies to: 256-350

tests/utils/test_logging.py (1)

43-82: Comprehensive logging decorator tests look solid

The autouse fixture plus setup_logging pattern gives each test a clean environment and forces flashinfer.api_logging to re‑read env vars, which matches how the decorator is intended to be configured. The suite covers level semantics, enums, defaults vs explicit kwargs, crash‑safety, CUDA tensors (incl. graph capture), class methods, and multiple invocation patterns; all expectations are consistent with the current implementation in flashinfer/api_logging.py.

Also applies to: 83-585

flashinfer/api_logging.py (2)

70-163: Value formatting and CUDA‑graph‑aware tensor stats look well thought out

_format_value’s handling of enums, tensors (with shape/stride/device metadata), FP4Tensor, and nested containers is consistent and log‑friendly. The level‑3 path correctly avoids statistics during CUDA graph capture via torch.cuda.is_current_stream_capturing and gracefully degrades to a “[statistics error: …]” line on failures. This matches the intended “rich when needed, safe when it fails” design.

369-462: Decorator implementation aligns with zero‑overhead and crash‑safety requirements

flashinfer_api_log correctly returns the original function when _API_LOG_LEVEL == 0 at import/decorator time, avoiding any wrapper call‑overhead in the disabled case. For enabled levels, the wrapper’s pre‑call logging (with class‑qualified names for *Wrapper classes) and post‑call logging, both protected by broad try/except blocks, achieve the “crash‑safe, best‑effort logging” goal without risking user code execution. The decorator also supports both @flashinfer_api_log and @flashinfer_api_log() usage, which matches the tests.

flashinfer/prefill.py (11)

25-25: API logger import is correctly scoped and consistent with other modules

The relative import of flashinfer_api_log mirrors how other utilities are imported and avoids circular dependencies; no issues here.

877-910: Decorating single_prefill_with_kv_cache_with_jit_module is safe and aligns with the logging design

Applying @flashinfer_api_log here cleanly instruments this top-level helper: it only wraps the JIT module’s run call and does not interfere with caching or custom-op registration elsewhere.

If you expect this function to be called inside torch.compile or CUDA graphs, please sanity‑check that logging levels ≥2 are disabled in those contexts to avoid unnecessary extra kernels for statistics.

912-985: Logging single_prefill_with_kv_cache covers the primary single-request prefill API (including the alias)

Instrumenting single_prefill_with_kv_cache is appropriate: it is the main single-prefill entry point, and the alias single_prefill_with_kv_cache_return_lse (via functools.partial) will also be logged because it refers to the decorated function.

Please confirm that your log level defaults (e.g., env FLASHINFER_APILOG_LEVEL) are set so that heavy workloads don’t accidentally incur level‑3 tensor statistics on this hot path in production.

1331-1498: Wrapper __init__ logging for BatchPrefillWithPagedKVCacheWrapper looks reasonable

Adding @flashinfer_api_log on the constructor is useful for debugging misconfigurations (workspace size, CUDA‑graph buffers, backend selection) and doesn’t affect runtime behavior, since allocations and checks were already here.

It may be worth running a small CUDA‑graph flow that constructs this wrapper in graph-enabled mode to ensure that level‑2/3 logging remains acceptable during initialization (even if execution itself happens inside graphs).

1527-1925: Logging BatchPrefillWithPagedKVCacheWrapper.plan is helpful; watch for overhead at high log levels

Decorating plan aligns with the goal of capturing problem setup (indptrs, masks, backends) and should not affect correctness, since the body is unchanged and the decorator is pure Python around it.

Because plan copies indptrs to CPU and computes derived arrays, logging at level‑3 (with statistics) on very large batches can add noticeable overhead. Ensure that in tight autotuning or repeated planning loops you keep FLASHINFER_APILOG_LEVEL at 0–1 unless you explicitly need detailed introspection.

1984-2220: Instrumenting BatchPrefillWithPagedKVCacheWrapper.run is consistent and appears safe

Applying @flashinfer_api_log to run gives visibility into the main prefill execution path (including PDL, backend selection, and output tensor shapes) while leaving the core kernel launch logic unchanged. The alias run_return_lse also goes through the decorated method, so both result variants are logged.

For very latency‑sensitive use (e.g., high‑QPS decode), you may want to benchmark with FLASHINFER_APILOG_LEVEL=1/2/3 to validate that the added logging—especially level‑3 statistics—meets your overhead budget.

2359-2474: Constructor logging for BatchPrefillWithRaggedKVCacheWrapper matches the paged wrapper behavior

Decorating __init__ here mirrors the paged wrapper: creation of workspace buffers, CUDA‑graph buffers, and backend selection are now log-visible without touching the execution logic.

As with the paged wrapper, it’s worth verifying that constructing this wrapper inside any higher‑level tooling (e.g., model factories) remains acceptable when logging is enabled, since constructor logs can be noisy if wrappers are created per‑request.

2503-2794: Logging BatchPrefillWithRaggedKVCacheWrapper.plan is appropriate and symmetric with paged prefill

The @flashinfer_api_log decorator on plan for ragged KV behaves analogously to the paged variant, exposing shapes, dtypes, and backend choices. The computational path (host transfers, planning, cached module setup) remains unchanged.

Same suggestion as for the paged plan: if you run many small plan calls (e.g., in tuning or dynamic workloads), keep logging levels conservative to avoid overhead from repeated detailed summaries.

2848-2995: BatchPrefillWithRaggedKVCacheWrapper.run logging cleanly wraps the main ragged prefill execution

Instrumenting run makes sense: it’s the hot execution entry point and logs will now include mask mode, backend, and output shapes, which are valuable for debugging MoE / ragged scenarios. No functional changes are introduced.

Consider adding this method to any logging tests you already have (similar to the decode wrappers) to confirm that level‑2/3 input summaries behave as expected with ragged indptr layouts.

3205-3337: trtllm_ragged_attention_deepseek is a good candidate for API logging

The decorator around this Triton/TRT‑LLM‑style ragged attention wrapper will expose sequence length, scaling, and PDL settings without affecting the underlying kernel invocation. Given this is a specialized path, logging is particularly useful here.

Because this function is often used in brittle integration scenarios, you may want to add a small smoke test that calls it with logging level 2–3 to ensure tensor summaries do not assume contiguous layouts beyond what the implementation already guarantees.

3340-3553: Logging trtllm_batch_context_with_kv_cache is consistent with other TRTLLM interfaces

Adding @flashinfer_api_log here gives observability into FP4/FP8 output configuration, kv layout, and PDL usage. Since the decorator is a thin wrapper and all runtime behavior (including FP4 tensor handling) is unchanged, this looks safe.

One thing to double‑check is that when out is an FP4Tensor, the logging layer’s input summarization doesn’t accidentally try to introspect internal fields in a way that could be overly verbose; if so, you might consider teaching the logger a lightweight summary for FP4Tensor.
flashinfer/gemm/gemm_base.py (9)
25-25: Importing flashinfer_api_log here is consistent with other modules

The relative import from ..api_logging is correct for this package layout and keeps GEMM logging centralized alongside other instrumented APIs.

914-1084: SegmentGEMMWrapper.run instrumentation looks correct and non-intrusive

Decorating SegmentGEMMWrapper.run adds valuable logging around a complex segmented GEMM path (seg_lens/seg_indptr, backend choice, shapes) without altering the kernel launch logic. The forward = run alias will also be logged since it resolves to the decorated method.

Given this is a potentially hot path in MoE workloads, you may want to benchmark with logging enabled at levels 2–3 to measure overhead from argument summarization on large x/weights.

1573-1682: mm_fp8 logging is well-placed on this primary FP8 GEMM entry point

The decorator wraps only the public Python API; core work is still delegated to trtllm_low_latency_gemm. No interactions with autotune or backend selection are altered.

Since mm_fp8 can be called in tight loops, please confirm via the new bench_logging_overhead.py benchmark that your target usage still meets latency goals at different logging levels.

2192-2356: Logging gemm_fp8_nt_groupwise is appropriate; be mindful of level‑3 cost on large tiles

This function is a central entry point for FP8 groupwise GEMM on Blackwell; adding @flashinfer_api_log is aligned with the overall API observability goal. The underlying CUTLASS/TRTLLM calls and device checks are unchanged.

Because this path typically operates on large (M,N,K) blocks and may be used in performance‑critical contexts, it’s worth evaluating level‑3 logging separately—tensor statistics over large matrices can add non‑trivial overhead even if the core compute kernels are unchanged.

2531-2690: group_gemm_fp8_nt_groupwise logging is consistent with other group GEMM entry points

Applying @flashinfer_api_log here exposes group sizes, scale layouts, and hardware constraints for grouped FP8 GEMM. The function’s internal device checks and workspace handling are unchanged.

Since this path asserts several alignment constraints (e.g., n % 8 == 0, k % 16 == 0), logging invalid calls at level ≥1 should be useful in diagnosing misconfigurations—ensure your logging level in debug runs is at least 1 to capture these.

2693-2825: Logging group_gemm_mxfp8_mxfp4_nt_groupwise is appropriate for this specialized MXFP4 path

The decorator wraps a fairly specialized Blackwell-only path that mixes FP8 activations with MXFP4 weights. Logging call shapes, tile sizes, and swap_ab will be valuable in debugging, and the underlying SM100 module calls are preserved.

Given the stricter assertions (tile sizes, alignment, dtype constraints), consider adding a small unit test that calls this function with logging enabled to ensure the logger handles packed MXFP4 tensors (uint8) and scale tensors without excessive verbosity.

2861-2989: group_deepgemm_fp8_nt_groupwise logging makes DeepGEMM MoE flows more observable

Instrumenting this DeepGEMM-based grouped FP8 GEMM call will surface group assignments (m_indices), scale granularities, and output sizes without altering the call to m_grouped_fp8_gemm_nt_contiguous. The architecture check remains in place.

Because this function is likely used in MoE expert-routing paths, you may want to validate that logging large m_indices tensors at level ≥2 is summarized compactly enough (e.g., only shape/type) to keep logs readable.

2992-3125: batch_deepgemm_fp8_nt_groupwise logging mirrors the grouped DeepGEMM path

The @flashinfer_api_log decorator here complements group_deepgemm_fp8_nt_groupwise, covering the batched FP8 DeepGEMM case. The masking logic (masked_m, expected_m) and backend call remain unchanged.

As with the grouped variant, confirm that logging of masked_m and expected_m behaves as expected in your tests—these values are key for understanding performance characteristics and correctness in partially‑filled batches.

568-641: Remove SM110 from the docstring—the underlying C++ implementation only supports SM100 and SM103

The runtime check is correct. Analysis of the C++ source reveals:

csrc/tgv_gemm.cu includes cutlass/gemm/collective/builders/sm100_common.inl and uses SM100-specific tcgen05.mma instructions

gen_tgv_gemm_sm10x_module is documented as "Generate TGV GEMM module for SM100 architecture"

No SM110 support path exists in the TGV GEMM implementation

The test file correctly restricts to ["100", "103"]

The docstring at lines 573–580 incorrectly claims SM110 support. It should be corrected to:
Requires SM100, SM103 architecture
Adding SM110 to the runtime check as suggested would enable an unsupported architecture.

Likely an incorrect or invalid review comment.
flashinfer/decode.py (10)

24-24: LGTM - API logging import added.

The import is correctly placed and follows the project's import organization.

316-346: API logging decorator correctly applied.

The decorator is properly applied to this public API function. As a POC, this demonstrates the instrumentation approach. Before production use, verify that the logging overhead at levels 1-3 is acceptable for your performance requirements.

393-410: Correct decorator placement on overloaded function.

The decorator is correctly applied to the implementation function rather than the overload signatures. This is the proper pattern for decorated overloaded functions.

652-664: API logging decorator correctly applied to constructor.

The decorator is properly applied to the class constructor. The decorator implementation correctly detects and includes the class name in logs.

816-840: API logging decorator correctly applied to plan method.

The decorator is properly applied. Note that at log level 2+, this method will generate verbose logs due to the many parameters. This is expected behavior for API logging and helps with debugging.

1170-1186: Correct decorator placement on overloaded method.

The decorator is correctly applied to the implementation method rather than the overload signatures. The decorator will properly capture the *args parameter for logging.

2068-2089: API logging decorator correctly applied.

The decorator is properly applied to this public API function. The extensive parameters and complex logic make this a good candidate for API logging to aid debugging.

2347-2364: API logging decorator correctly applied.

The decorator is properly applied to this public API function.

2539-2556: API logging decorator correctly applied.

The decorator is properly applied to this MLA-specific decode function.

2707-2724: API logging decorator correctly applied.

The decorator is properly applied to this function. Overall, the API logging instrumentation in this file is consistent and well-implemented.

POC Assessment: The decorator applications throughout this file demonstrate a clean, non-invasive approach to API logging. The implementation maintains zero overhead when disabled and preserves function signatures and behavior. Before moving beyond POC, consider:

Verify benchmark results show acceptable overhead at log levels 1-3

Consider adding configuration to selectively enable/disable logging for specific functions

Document the logging format and levels for end users

Based on the PR description, benchmarks exist at benchmarks/bench_logging_overhead.py. Please verify the benchmark results demonstrate acceptable performance characteristics at different log levels, particularly for hot-path functions like BatchDecodeWithPagedKVCacheWrapper.run.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

flashinfer/api_logging.py (3)
338-362: Narrow _get_default_params exception handling to the documented cases

inspect.signature is documented to raise TypeError / ValueError on failure; catching every Exception here is broader than necessary and makes debugging real failures harder. The earlier review comment already pointed this out.

You can keep the fail‑safe behavior but tighten the handler:
-    except Exception:
-        # If we can't inspect the signature, return empty dict
-        return {}
+    except (TypeError, ValueError):
+        # If we can't inspect the signature, return empty dict
+        return {}
This still preserves the “logging must not crash user code” guarantee.

493-528: Simplify class-name check and improve logging of internal logging errors

Two small points around the decorator:

The condition if "Wrapper" in class_name or class_name in ["BatchMLAPagedAttentionWrapper"]: is redundant because "BatchMLAPagedAttentionWrapper" already matches the "Wrapper" substring. This was noted in a previous review.

In the pre/post logging try blocks you catch Exception and log via _logger.error(...), which drops the traceback. Given this is a debugging/logger subsystem, having the traceback via logging.exception is usually more helpful and you’re already isolating it from user code.

Possible refactor:
-                    class_name = args[0].__class__.__name__
-                    if "Wrapper" in class_name or class_name in [
-                        "BatchMLAPagedAttentionWrapper"
-                    ]:
-                        func_name = f"{class_name}.{func_name}"
+                    class_name = args[0].__class__.__name__
+                    if "Wrapper" in class_name:
+                        func_name = f"{class_name}.{func_name}"
                 except Exception:
                     pass
@@
-            try:
+            try:
                 if _API_LOG_LEVEL == 1:
@@
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}")
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (pre-execution)]", func_name
+                )
@@
-            try:
+            try:
                 if _API_LOG_LEVEL >= 2:
@@
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}")
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (outputs)]", func_name
+                )
This keeps user‑visible behavior the same while making logger failures easier to diagnose.

29-31: Make FLASHINFER_APILOG_LEVEL parsing robust to invalid env values

Right now int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) will raise ValueError (and break import) if the env var is set to a non‑integer string. It’s safer to treat invalid values as “0” (disabled) so logging config can never prevent FlashInfer from importing.

You can keep the behavior while hardening parsing like this:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))
+try:
+    _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))
+except (TypeError, ValueError):
+    # Invalid user input; fall back to level 0 (disabled)
+    _API_LOG_LEVEL = 0
Please rerun the existing logging tests (and any import-time tests) after this change.

🧹 Nitpick comments (2)

flashinfer/api_logging.py (2)
145-304: Optional: guard against recursive structures in _format_value

_format_value walks lists/tuples/dicts recursively and will recurse indefinitely on cyclic structures (e.g., a list containing itself), which can appear in complex call graphs. In a debugging logger this is usually rare but when it happens it’ll raise RecursionError from the logging path.

If you want to harden this, consider adding a seen: set[int] parameter (or a max-depth cut‑off) and short‑circuit when id(value) is already in seen. No need to change now if your call sites never pass cyclic objects, but it’s something to keep in mind.

420-441: Use func_name in _log_function_outputs (or drop it) to fix the unused arg and improve logs

func_name is currently unused in _log_function_outputs, which triggers the linter and also makes the output block a bit less self‑describing when skimming logs.

A small tweak makes the output clearer and fixes ARG001:
-    # Log outputs
-    lines.append("Output value:")
+    # Log outputs
+    lines.append(f"Output value for {func_name}:")
This keeps the existing separators and should be low‑risk, but please re-run the logging tests because they may assert on exact strings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92c15f7 and c049bbf.

📒 Files selected for processing (1)

flashinfer/api_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

🪛 Ruff (0.14.5)

flashinfer/api_logging.py

86-86: Do not catch blind exception: Exception

(BLE001)

106-106: Do not catch blind exception: Exception

(BLE001)

124-124: Do not catch blind exception: Exception

(BLE001)

132-132: Do not catch blind exception: Exception

(BLE001)

236-236: Do not catch blind exception: Exception

(BLE001)

300-300: Use explicit conversion flag

Replace with conversion flag

(RUF010)

311-311: Use explicit conversion flag

Replace with conversion flag

(RUF010)

315-315: Use explicit conversion flag

Replace with conversion flag

(RUF010)

316-316: Do not catch blind exception: Exception

(BLE001)

359-359: Consider moving this statement to an else block

(TRY300)

360-360: Do not catch blind exception: Exception

(BLE001)

420-420: Unused function argument: func_name

(ARG001)

444-444: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

505-506: try-except-pass detected, consider logging the exception

(S110)

505-505: Do not catch blind exception: Exception

(BLE001)

516-516: Do not catch blind exception: Exception

(BLE001)

517-517: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

527-527: Do not catch blind exception: Exception

(BLE001)

528-528: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

coderabbitai · 2025-11-18T22:20:05Z

flashinfer/api_logging.py

+def _setup_logger():
+    """Set up the logger based on environment variables."""
+    if _API_LOG_LEVEL == 0:
+        # Completely disable logging for zero overhead
+        _logger.addHandler(logging.NullHandler())
+        _logger.setLevel(logging.CRITICAL + 1)  # Higher than any level
+        return
+
+    # All enabled levels use loggging.DEBUG; verbosity is controlled by FLASHINFER_APILOG_LEVEL instead
+    _logger.setLevel(logging.DEBUG)
+
+    # Remove any existing handlers
+    _logger.handlers.clear()
+
+    # Create handler based on destination
+    if _API_LOG_DEST == "stdout":
+        handler = logging.StreamHandler(sys.stdout)
+    elif _API_LOG_DEST == "stderr":
+        handler = logging.StreamHandler(sys.stderr)
+    else:
+        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+
+    # Use a simple formatter (we'll format the detailed content ourselves)
+    formatter = logging.Formatter("%(message)s")
+    handler.setFormatter(formatter)
+
+    _logger.addHandler(handler)
+    _logger.propagate = False  # Don't propagate to root logger
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Avoid import-time failures when FLASHINFER_APILOG_DEST is invalid

logging.FileHandler(_API_LOG_DEST, mode="a") will raise (e.g., bad path, permission) at import time and prevent FlashInfer from loading when logging is enabled. For a “debugging only” facility this is a bit too fragile; better to degrade to stderr (or a NullHandler) instead of failing imports.

A minimal way to harden this while keeping semantics:

- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )

Afterward, it’d be good to verify behavior by setting FLASHINFER_APILOG_DEST to an unwritable path and importing flashinfer.api_logging in a small repro.

🏁 Script executed:

#!/bin/bash # Find and examine the full flashinfer/api_logging.py file fd -t f "api_logging.py" --exec cat {} \;

Length of output: 19059

🏁 Script executed:

#!/bin/bash # Check when _setup_logger() is called (module level vs function level) rg -n "_setup_logger" -A2 -B2

Length of output: 462

🏁 Script executed:

#!/bin/bash # Search for any existing error handling around logging setup rg -n "except|try|OSError" flashinfer/api_logging.py

Length of output: 594

Add error handling to prevent import-time failures from invalid logging paths

The current code will break module imports if FLASHINFER_APILOG_DEST points to an invalid or unwritable path. Since _setup_logger() is called at module load time (line 67), any OSError from logging.FileHandler() will prevent FlashInfer from loading entirely. For a debugging facility, this is too fragile.

Add error handling to gracefully degrade to stderr:

# Create handler based on destination; never let a bad path break import + try: - # Create handler based on destination - if _API_LOG_DEST == "stdout": + if _API_LOG_DEST == "stdout": handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": + elif _API_LOG_DEST == "stderr": handler = logging.StreamHandler(sys.stderr) - else: + else: handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )

🤖 Prompt for AI Agents

In flashinfer/api_logging.py around lines 36 to 64, creating a FileHandler from an invalid or unwritable FLASHINFER_APILOG_DEST can raise OSError at import time and crash the module; wrap the handler creation in a try/except that catches OSError (and IOException/OSError subclasses), and on exception fall back to a logging.StreamHandler(sys.stderr) so the module still loads; also emit a non-raising warning message via the selected handler or a temporary NullHandler so the failure is recorded without propagating the exception, and ensure the rest of the logger setup (formatter, addHandler, propagate) runs regardless of whether the file handler succeeded.

yzh119

This is a great feature to have, and thanks @bkryu for brining this up! One general question, do you think we should log kernel behavior at python side or C++ side?

bkryu · 2025-11-19T18:38:24Z

This is a great feature to have, and thanks @bkryu for brining this up! One general question, do you think we should log kernel behavior at python side or C++ side?

Thanks @yzh119. This is a question I considered, more from a "should we log what happens inside each function call because the current design only logs inputs and outputs" point of view. My initial sentiment is that logging kernel behavior will be tricky because we have multiple kernel providers -- some of which are closed source and others are OSS, and some that already come with logging:

cuDNN and cuBLAS has their own respective API logging settings for logging kernel and kernel selecting heuristic behavior. For these backends, it is best if we use their API logs to debug them.
TRT-LLM appears to have something but I am not sure whether it has the ability to log kernel-level info useful for debugging.
FlashInfer-native kernels (CUDA and CuTe DSL) do not have a logging infrastructure and will require us to implement something.
That being said, if we believe that logging kernel behavior is useful, I agree that we can look into it

yzh119 · 2025-11-19T19:58:11Z

@xslingcn @Ubospica @YiyanZhai @zanderjiang this PR should greatly reducing the overhead of extracting workloads for flashinfer-bench project.

Ubospica · 2025-11-19T20:18:10Z

@yzh119 Thanks for sharing! The low-overhead design looks very useful. We definitely have plans to support low-overhead dump and CUDA graph support in our roadmap. I will leverage this PR as a key reference.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

flashinfer/api_logging.py (2)
43-47: Make FLASHINFER_LOGLEVEL_DBG parsing robust to invalid values

As written, int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) will raise ValueError at import time if a user sets the env var to a non‑integer (e.g. "foo"), preventing flashinfer.api_logging (and any API importing it) from loading. For a debug‑only facility this is too fragile; it should gracefully fall back to level 0.

You can harden this by catching parse errors and defaulting to 0 (optionally emitting a warning to stderr):
-# Read environment variables once at module load time
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
-_API_LOG_DEST = _substitute_process_id(
-    os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout")
-)
+# Read environment variables once at module load time
+_raw_log_level = os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")
+try:
+    _API_LOG_LEVEL = int(_raw_log_level)
+except (TypeError, ValueError):
+    # Fall back to level 0 on invalid input instead of failing import
+    _API_LOG_LEVEL = 0
+    print(
+        f"[flashinfer.api_logging] Invalid FLASHINFER_LOGLEVEL_DBG={_raw_log_level!r}, "
+        "falling back to 0",
+        file=sys.stderr,
+    )
+_API_LOG_DEST = _substitute_process_id(
+    os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout")
+)
This keeps existing semantics for valid values while avoiding import‑time crashes on bad configuration.

85-112: Prevent import-time failures when FLASHINFER_LOGDEST_DBG points to an invalid path

_setup_logger() calls logging.FileHandler(_API_LOG_DEST, mode="a") unguarded whenever _API_LOG_DEST is neither "stdout" nor "stderr". If the env var points to an invalid or unwritable location, this raises OSError at import time and breaks all FlashInfer imports when logging is enabled.

For a debugging feature, it’s safer to degrade to stderr instead of failing import. For example:
-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break import
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )
This preserves current behavior for valid paths but avoids hard failures when the destination is misconfigured.

🧹 Nitpick comments (3)

tests/utils/test_logging.py (1)

227-228: Address Ruff ARG001 unused-argument warnings in test helpers (optional)

Several test helper functions intentionally ignore some parameters (mode in test_default_parameters, x/y in crashing_function, and the various *_val parameters in test_different_data_types), which Ruff flags as ARG001. If your CI runs Ruff on tests, you can silence these by either:

Prefixing unused parameters with _ (e.g., _mode, _int_val, …), or

Adding # noqa: ARG001 on the relevant function definitions.

This keeps the tests behaviorally identical while satisfying the linter.

Also applies to: 322-323, 355-365
flashinfer/api_logging.py (2)
498-519: _log_function_outputs’s func_name is unused — consider wiring it into the log line

_log_function_outputs accepts func_name but never uses it, which is slightly confusing and flagged by linters as ARG001. You can either drop the parameter, or (more usefully) include it in the header line:
-    lines = []
-    # Log outputs
-    lines.append("Output value:")
+    lines = []
+    # Log outputs
+    lines.append(f"Output value: {func_name}")
This keeps the existing "Output value:" prefix (so current tests continue to pass) while making the per‑call output section self‑describing.

596-607: Improve diagnostics for logging failures with logging.exception

The decorator intentionally guards pre‑ and post‑execution logging with broad except Exception blocks to avoid interfering with the wrapped function. That makes sense, but using logging.error(..., %s) drops the traceback, which can make debugging logging issues tricky.

You can keep the broad guard while emitting richer diagnostics via logging.exception:
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}")
+            except Exception:
+                _logger.exception(
+                    "[LOGGING ERROR in %s (pre-execution)]", func_name
+                )
@@
-            except Exception as e:
-                _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}")
+            except Exception:
+                _logger.exception("[LOGGING ERROR in %s (outputs)]", func_name)
This preserves crash‑safety while giving you a stack trace when something goes wrong inside the logging path.

Also applies to: 613-618

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c049bbf and 73c8eb2.

📒 Files selected for processing (3)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
tests/utils/test_logging.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_api_log (522-627)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_api_log (522-627)

decorator (581-622)

wrapper (583-620)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

🪛 Ruff (0.14.5)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

346-346: Do not catch blind exception: Exception

(BLE001)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

flashinfer/api_logging.py

142-142: Do not catch blind exception: Exception

(BLE001)

162-162: Do not catch blind exception: Exception

(BLE001)

180-180: Do not catch blind exception: Exception

(BLE001)

210-210: Do not catch blind exception: Exception

(BLE001)

314-314: Do not catch blind exception: Exception

(BLE001)

378-378: Use explicit conversion flag

Replace with conversion flag

(RUF010)

389-389: Use explicit conversion flag

Replace with conversion flag

(RUF010)

393-393: Use explicit conversion flag

Replace with conversion flag

(RUF010)

394-394: Do not catch blind exception: Exception

(BLE001)

437-437: Consider moving this statement to an else block

(TRY300)

438-438: Do not catch blind exception: Exception

(BLE001)

498-498: Unused function argument: func_name

(ARG001)

522-522: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

593-594: try-except-pass detected, consider logging the exception

(S110)

593-593: Do not catch blind exception: Exception

(BLE001)

606-606: Do not catch blind exception: Exception

(BLE001)

607-607: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

617-617: Do not catch blind exception: Exception

(BLE001)

618-618: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (2)

tests/utils/test_logging.py (1)

27-585: Comprehensive coverage of decorator behavior

This test suite does a good job exercising the decorator across levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, tensor metadata, nested structures, and CUDA graph scenarios; I don’t see functional issues here.

benchmarks/bench_logging_overhead.py (1)

56-219: Benchmark structure and CUDA timing look solid

The undecorated vs decorated matmul comparison, CUDA synchronization, warmup phase, and summary statistics all look correct for measuring logging overhead.

Also applies to: 221-339

benchmarks/bench_logging_overhead.py

…ing level above 3 now automatically include cudnn and cublas API logging

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flashinfer/gemm/gemm_base.py (1)
568-640: Fix SM110 support check and consider basic shape validation in tgv_gemm_sm100

Two minor issues here:

The docstring says “Requires SM100, SM103, or SM110 architecture” (Lines 593–595), but the runtime check only allows "100" and "103":
if not _match_sm_version(a.device, ["100", "103"]):
    raise ValueError("TGV GEMM requires SM100, SM103 architecture")
If SM110 is actually supported by the TGV kernel (as the name gen_tgv_gemm_sm10x_module and note about SM100f suggest), this should include "110"; otherwise the note in the docstring should be adjusted to avoid misleading users.

There’s no explicit validation of tensor ranks/shapes or bias shape. Given the comment “Computes: A @ B + bias” and the expectation (M, K) @ (K, N) + (N,), adding a fast check for a.ndim == 2, b.ndim == 2, bias.ndim == 1, a.shape[1] == b.shape[0], and bias.shape[0] == b.shape[1] would fail early with clear error messages instead of deferring to the underlying kernel.

Both changes would make this wrapper more robust without affecting the autotuning flow.

♻️ Duplicate comments (2)

flashinfer/api_logging.py (1)
28-45: Harden env parsing and handler setup to avoid import-time failures

Right now, two failure modes in this module can prevent flashinfer from importing at all:

Line 42: int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) raises ValueError/TypeError if the env var is non-numeric (e.g. "foo"), aborting import.

Lines 65–71: logging.FileHandler(_API_LOG_DEST, mode="a") will raise OSError on invalid/unwritable paths, also at import time.

For a debug-only feature, this is too fragile; misconfigured env vars should disable logging or fall back safely, not make the library unusable.

A minimal hardening would be:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
+try:
+    _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
+except (TypeError, ValueError):
+    # Fall back to disabled logging on invalid level
+    _API_LOG_LEVEL = 0
and in _setup_logger:
-    # Create handler based on destination
-    if _API_LOG_DEST == "stdout":
-        handler = logging.StreamHandler(sys.stdout)
-    elif _API_LOG_DEST == "stderr":
-        handler = logging.StreamHandler(sys.stderr)
-    else:
-        handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    # Create handler based on destination; never let a bad path break import
+    try:
+        if _API_LOG_DEST == "stdout":
+            handler = logging.StreamHandler(sys.stdout)
+        elif _API_LOG_DEST == "stderr":
+            handler = logging.StreamHandler(sys.stderr)
+        else:
+            handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+    except OSError as exc:
+        # Fall back to stderr so the rest of the library remains usable
+        handler = logging.StreamHandler(sys.stderr)
+        _logger.warning(
+            "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr",
+            _API_LOG_DEST,
+            exc,
+        )
This keeps existing semantics in the happy path but makes misconfiguration non-fatal.

Also applies to: 51-82
benchmarks/bench_logging_overhead.py (1)
30-36: Align benchmark’s LOG_DEST with the decorator’s actual destination

Here LOG_DEST defaults to /tmp/flashinfer_benchmark_log.txt when FLASHINFER_LOGDEST_DBG is unset, but flashinfer.api_logging still defaults its destination to "stdout". That means:

The decorator logs to stdout.

The benchmark later looks for a file at LOG_DEST (Lines 248–251, 292–301) and usually finds nothing, so the “LOG FILE INFO” section never prints.

To keep the benchmark’s view of the log destination consistent with the decorator, set the env var before importing flashinfer_log:
LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))
LOG_DEST = os.environ.get(
    "FLASHINFER_LOGDEST_DBG", "/tmp/flashinfer_benchmark_log.txt"
)
+# Ensure the decorator sees the same destination default
+os.environ.setdefault("FLASHINFER_LOGDEST_DBG", LOG_DEST)

# Import the decorator
from flashinfer.api_logging import flashinfer_log
This preserves the current default path but guarantees both components use the same destination.

Also applies to: 248-251, 292-301

🧹 Nitpick comments (2)

flashinfer/cudnn/prefill.py (1)

6-6: Decorator integration and new backend flags look safe; consider documenting backend.

Importing and applying @flashinfer_log to cudnn_batch_prefill_with_kv_cache is semantically safe because the decorator is a no-op at log level 0 and preserves the function signature.

Adding is_cuda_graph_compatible and backend is backward‑compatible; they are defaulted and only change behavior when explicitly used (e.g., backend=="cubin" forces the fmha‑gen path).

Minor: the docstring mentions is_cuda_graph_compatible but not backend. It may be worth adding a short description for backend for consistency and to clarify the "cubin" override behavior.

Also applies to: 387-411

tests/utils/test_logging.py (1)

43-585: Logging tests are comprehensive; Ruff ARG001 hints are low-priority in this context

This suite does a good job validating flashinfer_log behavior (levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, CUDA graphs, nested structures, etc.) and the fixture correctly resets env state and reloads the module between tests. The Ruff ARG001 warnings about unused parameters in test functions (e.g., mode, int_val, etc.) are expected for this style of tests; if you care about a clean lint run you can prefix those parameters with _ or reference them trivially, but functionally the current code is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73c8eb2 and 3d11a48.

📒 Files selected for processing (10)

benchmarks/bench_logging_overhead.py (1 hunks)
flashinfer/api_logging.py (1 hunks)
flashinfer/cudnn/decode.py (2 hunks)
flashinfer/cudnn/prefill.py (2 hunks)
flashinfer/decode.py (10 hunks)
flashinfer/fused_moe/core.py (7 hunks)
flashinfer/gemm/gemm_base.py (12 hunks)
flashinfer/mla.py (4 hunks)
flashinfer/prefill.py (11 hunks)
tests/utils/test_logging.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

flashinfer/mla.py

🧰 Additional context used

🧬 Code graph analysis (9)

flashinfer/cudnn/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

tests/utils/test_logging.py (1)

flashinfer/api_logging.py (3)

flashinfer_log (466-564)

decorator (519-560)

wrapper (521-558)

flashinfer/cudnn/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/fused_moe/core.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/api_logging.py (1)

include/flashinfer/logging.h (1)

logging (31-41)

benchmarks/bench_logging_overhead.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/decode.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/prefill.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

flashinfer/gemm/gemm_base.py (1)

flashinfer/api_logging.py (1)

flashinfer_log (466-564)

🪛 Ruff (0.14.5)

tests/utils/test_logging.py

227-227: Unused function argument: mode

(ARG001)

322-322: Unused function argument: x

(ARG001)

322-322: Unused function argument: y

(ARG001)

323-323: Avoid specifying long messages outside the exception class

(TRY003)

356-356: Unused function argument: int_val

(ARG001)

357-357: Unused function argument: float_val

(ARG001)

358-358: Unused function argument: bool_val

(ARG001)

359-359: Unused function argument: str_val

(ARG001)

360-360: Unused function argument: list_val

(ARG001)

361-361: Unused function argument: tuple_val

(ARG001)

362-362: Unused function argument: dict_val

(ARG001)

363-363: Unused function argument: none_val

(ARG001)

flashinfer/api_logging.py

108-108: Do not catch blind exception: Exception

(BLE001)

128-128: Do not catch blind exception: Exception

(BLE001)

146-146: Do not catch blind exception: Exception

(BLE001)

154-154: Do not catch blind exception: Exception

(BLE001)

258-258: Do not catch blind exception: Exception

(BLE001)

322-322: Use explicit conversion flag

Replace with conversion flag

(RUF010)

333-333: Use explicit conversion flag

Replace with conversion flag

(RUF010)

337-337: Use explicit conversion flag

Replace with conversion flag

(RUF010)

338-338: Do not catch blind exception: Exception

(BLE001)

381-381: Consider moving this statement to an else block

(TRY300)

382-382: Do not catch blind exception: Exception

(BLE001)

442-442: Unused function argument: func_name

(ARG001)

466-466: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

531-532: try-except-pass detected, consider logging the exception

(S110)

531-531: Do not catch blind exception: Exception

(BLE001)

544-544: Do not catch blind exception: Exception

(BLE001)

545-545: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

555-555: Do not catch blind exception: Exception

(BLE001)

556-556: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

benchmarks/bench_logging_overhead.py

1-1: Shebang is present but file is not executable

(EXE001)

32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"

(S108)

329-329: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (5)

flashinfer/decode.py (1)

24-24: API logging decoration across decode entry points looks correct and non-intrusive.

Importing flashinfer_log and applying it to all user‑facing decode APIs (single, batch, wrappers, and TRT‑LLM/XQA helpers) preserves signatures and overload contracts while adding centralized logging.

The decorator’s no‑op behavior at log level 0 avoids extra overhead in production, and functools.wraps keeps introspection and partialmethod/aliasing (begin_forward, run_return_lse) working as before.

No behavioral changes to argument defaults, CUDA‑graph handling, or backend routing are introduced by these annotations.

Also applies to: 316-345, 393-410, 652-664, 816-840, 1170-1186, 2068-2089, 2342-2359, 2527-2544, 2689-2705

flashinfer/prefill.py (1)

25-25: Prefill API logging wiring is consistent and preserves existing behavior.

Importing and applying @flashinfer_log to single‑prefill helpers, both prefill wrappers (paged and ragged), and the TRT‑LLM prefill/context functions cleanly instruments all public prefill entry points without altering their logic.

Overloads remain untouched; only concrete implementations are wrapped, and begin_forward / run_return_lse aliases continue to work through the decorated methods.

No changes to backend selection, CUDA‑graph handling, or tensor shape/device contracts are introduced in these hunks.

Also applies to: 877-888, 962-985, 1331-1346, 1527-1563, 1984-2000, 2359-2372, 2503-2531, 2848-2859, 3205-3227, 3340-3362

flashinfer/cudnn/decode.py (1)

6-6: cudnn decode logging and CUDA‑graph flag propagation look correct.

Importing flashinfer_log and decorating cudnn_batch_decode_with_kv_cache integrates this entry point into the new logging system without touching its core logic.

The is_cuda_graph_compatible argument is defaulted and only matters in the fmha‑gen fallback path; CUDNN proper continues to behave as before.

Signature and docstring remain aligned, so existing callers are unaffected while gaining optional logging and an explicit CUDA‑graph compatibility hint.

Also applies to: 256-273

flashinfer/fused_moe/core.py (1)

23-23: API-level logging instrumentation for MoE wrappers looks correct

Importing flashinfer_log and decorating the public MoE entrypoints here cleanly extends logging without touching the underlying custom ops. Because flashinfer_log returns the original function when FLASHINFER_LOGLEVEL_DBG=0, these changes are behavior-preserving in the default/off configuration, and the wrappers’ signatures remain unchanged. No issues from the decorator stacking are apparent.

Also applies to: 688-719, 1862-1881, 1943-1965, 2017-2040, 2095-2127, 2225-2257

flashinfer/gemm/gemm_base.py (1)

914-1083: Decorator-based logging on GEMM and segment-GEMM entrypoints looks consistent

Importing flashinfer_log here and applying it to SegmentGEMMWrapper.run, tgv_gemm_sm100, and the various FP8/FP4 GEMM entrypoints (mm_fp8, mm_fp4, bmm_fp8, gemm_fp8_nt_groupwise, blockscaled/group variants, DeepGEMM group/batch wrappers) is consistent with the rest of the PR:

Signatures and return types are unchanged.

At FLASHINFER_LOGLEVEL_DBG=0 the decorator returns the original function, so hot paths (GEMM kernels) pay no additional cost when logging is disabled.

Where combined with @backend_requirement, the backend checks still wrap the logged function cleanly.

From a call-site perspective, this is a non-breaking way to surface API-level logs for GEMM operations.

Also applies to: 1573-1682, 1851-2002, 2101-2190, 2193-2356, 2502-2527, 2531-2690, 2694-2825, 2862-2989, 2993-3125

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d11a48 and a071556.

📒 Files selected for processing (2)

LOGGING.md (1 hunks)
README.md (1 hunks)

✅ Files skipped from review due to trivial changes (1)

README.md

🧰 Additional context used

🪛 GitHub Actions: pre-commit

LOGGING.md

[error] 1-1: pre-commit end-of-file-fixer hook failed. The hook modified LOGGING.md (added a newline at EOF).

🪛 LanguageTool

LOGGING.md

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (1)

LOGGING.md (1)

5-82: Well-structured documentation with clear guidance.

The logging documentation is clearly organized with practical examples, environment variable reference, and real-world use cases (CUDA graph compatibility, multi-GPU process ID substitution). The Quick Start and logging levels table make it easy for users to get started.

LOGGING.md

bkryu requested review from Anerudhan, aleozlx, cyx-6, djmmoss, jiahanc, nvmbreughe, wenscarl and yzh119 as code owners November 18, 2025 21:52

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

bkryu self-assigned this Nov 19, 2025

yzh119 reviewed Nov 19, 2025

View reviewed changes

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

benchmarks/bench_logging_overhead.py Show resolved Hide resolved

bkryu added 7 commits November 21, 2025 00:40

First commit

a23647e

Adding benchmark. Applying pre-commit

09d64ae

Log System Info

c86f41e

Rename logging env vars. Set default dest to stdout

55a36f7

Allow %i substitution for process ID for multi-GPU environments. Logg…

5c0d96d

…ing level above 3 now automatically include cudnn and cublas API logging

Add time stamps and space out levels

e7abe89

Cleanup and streamline

3d11a48

bkryu force-pushed the api_logging branch from 73c8eb2 to 3d11a48 Compare November 21, 2025 01:20

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

Adding documentation updates

a071556

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

LOGGING.md Show resolved Hide resolved

LOGGING.md Outdated Show resolved Hide resolved

Fix typo and apply pre-commit

b66249e

bkryu changed the title ~~[wip] feat: Enable API Logging for Better Debugging POC~~ feat: Enable API Logging for Better Debugging POC Nov 21, 2025



		# Read environment variables once at module load time
		_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))

feat: Enable API Logging for Better Debugging POC #2108

Are you sure you want to change the base?

feat: Enable API Logging for Better Debugging POC #2108

Uh oh!

Conversation

bkryu commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Nov 18, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

bkryu commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzh119 commented Nov 19, 2025

Uh oh!

Ubospica commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

bkryu commented Nov 18, 2025 •

edited

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

bkryu commented Nov 19, 2025 •

edited

Loading

Ubospica commented Nov 19, 2025 •

edited

Loading