-
Notifications
You must be signed in to change notification settings - Fork 573
feat: Enable API Logging for Better Debugging POC #2108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust and flexible API logging system designed to significantly enhance debugging capabilities within FlashInfer. The system allows developers to easily instrument functions with a decorator to log calls, arguments, and return values at various levels of detail, from simple function names to comprehensive tensor metadata and statistics. A key design principle is crash-safety, ensuring that input arguments are logged even if a function terminates unexpectedly, which is crucial for diagnosing complex issues like CUDA kernel crashes. Furthermore, the logging system is optimized for performance, offering zero overhead when disabled, and intelligently handles CUDA graph capture to avoid performance bottlenecks. This infrastructure will be invaluable for understanding MoE workloads and other performance-critical operations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a crash-safe, environment-configurable API logging decorator ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as User code
participant Decorator as `@flashinfer_log`
participant Logger as flashinfer.api logger
participant Impl as API implementation
Note over Decorator,Logger: Level & destination read from env at import
alt Level = 0 (disabled)
Caller->>Decorator: call(args, kwargs)
Decorator->>Impl: direct call (no logging)
Impl-->>Decorator: result
Decorator-->>Caller: result
else Level >= 1 (enabled)
Caller->>Decorator: call(args, kwargs)
Decorator->>Logger: _log_function_inputs (name, args, defaults)
Logger-->>Logger: format per level (1..5)
Decorator->>Impl: execute API (protected from crashes/CUDA graph issues)
Impl-->>Decorator: result
Decorator->>Logger: _log_function_outputs (result)
Logger-->>Decorator: persist (file / stdout / stderr)
Decorator-->>Caller: result
end
Note over Logger: Level 5 includes tensor stats (skipped during CUDA-graph capture)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a comprehensive and well-designed API logging system for FlashInfer. The use of a decorator (@flashinfer_api_log) makes it easy to apply, and the control via environment variables (FLASHINFER_APILOG_LEVEL, FLASHINFER_APILOG_DEST) is flexible. The zero-overhead design when disabled is a key feature and is implemented correctly. The different logging levels provide a good range of verbosity for debugging, and the crash-safe logging of inputs is a great feature for diagnosing kernel crashes. The inclusion of a benchmark script and a thorough test suite is excellent.
My review focuses on improving the robustness and maintainability of the new logging module. I've identified a few areas for improvement, such as making exception handling more specific and simplifying some conditional logic.
Overall, this is a great addition to the library that will significantly improve debuggability.
flashinfer/api_logging.py
Outdated
|
|
||
|
|
||
| # Read environment variables once at module load time | ||
| _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int() conversion for FLASHINFER_APILOG_LEVEL is not robust against invalid user input. If a user sets this environment variable to a non-integer string (e.g., "foo"), it will raise a ValueError and prevent the module from being imported. It would be more robust to handle this case gracefully, for example by defaulting to level 0 and perhaps logging a warning.
| _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) | |
| try: | |
| _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) | |
| except (ValueError, TypeError): | |
| _API_LOG_LEVEL = 0 |
| except Exception: | ||
| # If we can't inspect the signature, return empty dict | ||
| return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except Exception: block is too broad. It will catch any exception, including system-exiting exceptions like KeyboardInterrupt, which can hide bugs and make debugging difficult. The inspect.signature() function is documented to raise ValueError or TypeError on failure. It's better to catch only these specific exceptions.
| except Exception: | |
| # If we can't inspect the signature, return empty dict | |
| return {} | |
| except (ValueError, TypeError): | |
| # If we can't inspect the signature, return empty dict | |
| return {} |
| if "Wrapper" in class_name or class_name in [ | ||
| "BatchMLAPagedAttentionWrapper" | ||
| ]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition class_name in ["BatchMLAPagedAttentionWrapper"] is redundant because the class name "BatchMLAPagedAttentionWrapper" already contains the substring "Wrapper", which is checked in the first part of the or condition. This can be simplified for better readability and maintainability.
if "Wrapper" in class_name:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
tests/utils/test_logging.py (1)
227-247: Consider quieting Ruff’s unused‑argument warnings in testsSeveral test helpers intentionally don’t use all parameters (e.g.,
modeintest_default_parameters, arguments tocrashing_function, and the many typed parameters intest_different_data_types). If Ruff is run on tests, you may want to silence these via_-prefixed names or# noqa: ARG001on those definitions to keep the suite warning‑free; behavior is otherwise fine.Also applies to: 322-344, 355-383
flashinfer/fused_moe/core.py (1)
23-23: MoE public API logging is wired correctlyWrapping the high‑level MoE entrypoints (
cutlass_fused_moe,trtllm_bf16_moe, all FP8/FP4 variants) withflashinfer_api_loggives good coverage of the performance‑critical surface without touching the custom‑op registration or autotuning internals. Signatures and control flow are preserved. Just be aware that at logging level 3, tensor statistics over large MoE activations/weights will be non‑trivial in cost, so that mode should remain a targeted debugging tool rather than a default.Also applies to: 688-906, 1862-1941, 1943-2015, 2017-2093, 2095-2223, 2225-2354
flashinfer/api_logging.py (2)
28-68: Env‑driven logger setup matches goals; consider a tiny robustness tweakReading
FLASHINFER_APILOG_LEVEL/FLASHINFER_APILOG_DESTonce at import and configuring a dedicatedflashinfer.apilogger (with aNullHandlerat level 0) cleanly achieves “zero‑overhead when disabled” and isolates logs from the root logger. One minor robustness improvement would be to guard theint()conversion for_API_LOG_LEVELso a malformed env var falls back to level 0 (or a safe default) instead of raising at import.
345-367: Use thefunc_nameargument in_log_function_outputsfor clearer logs
_log_function_outputsacceptsfunc_namebut currently ignores it, which also triggers a linter warning. You could make the logs more self‑describing and fix the unused parameter by adding a small header including the function name. For example:def _log_function_outputs(func_name: str, result: Any, level: int) -> None: @@ - lines = [] - # Log outputs - lines.append("Output value:") + lines = [] + lines.append(f"FlashInfer API Return: {func_name}") + lines.append("Output value:") lines.append(_format_value(result, level, indent=1)) @@ - _logger.debug("\n".join(lines)) + _logger.debug("\n".join(lines))This preserves existing tests (which only assert on
"Output value:") while improving traceability.benchmarks/bench_logging_overhead.py (1)
35-38: Align benchmark log destination with api_logging’s default to avoid confusionThe script defaults
LOG_DESTto/tmp/flashinfer_benchmark_log.txt, butflashinfer.api_loggingdefaultsFLASHINFER_APILOG_DESTto./flashinfer_log.txtwhen the env var is unset. If users only setFLASHINFER_APILOG_LEVEL(as shown in the usage block), the benchmark will print and clean up/tmp/...while the actual logs go to./flashinfer_log.txt, so log‑size reporting and cleanup can silently miss the real file.To make behavior predictable, consider either:
- Matching the default with the library:
-LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "/tmp/flashinfer_benchmark_log.txt") +LOG_DEST = os.environ.get("FLASHINFER_APILOG_DEST", "./flashinfer_log.txt")and/or
- Explicitly propagating
LOG_DESTinto the environment before importingflashinfer.api_loggingwhen the env var is not already set.Either option will keep the benchmark’s “LOG FILE INFO” and cleanup in sync with where the decorator actually writes.
Also applies to: 265-327
flashinfer/gemm/gemm_base.py (3)
1842-2002: Decorator stacking onmm_fp4means logging happens after backend/shape checks
mm_fp4is now decorated as:@backend_requirement(...) @flashinfer_api_log def mm_fp4(...): ...This order implies that backend and problem-size checks (and any heuristic backend selection) run first, and only if they pass does the call enter the logging wrapper and function body. That’s a reasonable choice and doesn’t affect correctness, but it does mean invalid/problematic calls rejected by
backend_requirementwon’t appear in API logs.If you’d prefer to log all attempted API calls—including ones failing backend requirements—you might want to reverse the order:
@flashinfer_api_log @backend_requirement(...) def mm_fp4(...): ...Please double‑check the implementation of
backend_requirementto ensure it doesn’t rely on attributes on the original function that could be hidden by another wrapper (thoughfunctools.wrapsin the logger should preserve most metadata).
2100-2189:bmm_fp8logging is consistent; same decorator-order caveat asmm_fp4
bmm_fp8is also defined with:@backend_requirement(...) @flashinfer_api_log def bmm_fp8(...): ...So backends/requirements are checked before logging, and only successful calls will be logged. The attributes added by
backend_requirement(e.g.,suitable_auto_backends) still attach to the outer wrapper, so the internal use ofbmm_fp8.suitable_auto_backendsremains valid.If you want logs for calls that are rejected due to unsupported compute capability or shapes, consider swapping decorator order as suggested for
mm_fp4, and verify thatbackend_requirementremains compatible with being under the logging wrapper.
2501-2527: Potential double-logging when callinggemm_fp8_nt_blockscaled
gemm_fp8_nt_blockscaledis a thin wrapper aroundgemm_fp8_nt_groupwise, and both are decorated with@flashinfer_api_log. A call togemm_fp8_nt_blockscaledwill therefore emit two log entries: one forgemm_fp8_nt_blockscaledand one for the innergemm_fp8_nt_groupwisecall.This might be desirable (showing both the high-level alias and the underlying primitive), but could also add noise to logs.
If you want only a single log entry per API call here, you could either:
- Drop the decorator from
gemm_fp8_nt_blockscaled, relying on the innergemm_fp8_nt_groupwiselogging, or- Keep only the alias decorated and undeco
gemm_fp8_nt_groupwiseif it’s considered an internal helper.Please choose based on how you expect users to call these APIs directly versus via wrappers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)flashinfer/cudnn/decode.py(2 hunks)flashinfer/cudnn/prefill.py(2 hunks)flashinfer/decode.py(10 hunks)flashinfer/fused_moe/core.py(7 hunks)flashinfer/gemm/gemm_base.py(12 hunks)flashinfer/mla.py(4 hunks)flashinfer/prefill.py(11 hunks)tests/utils/test_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
flashinfer/mla.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_api_log(369-462)decorator(418-457)wrapper(420-455)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/fused_moe/core.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/cudnn/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/cudnn/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
flashinfer/gemm/gemm_base.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(369-462)
🪛 Ruff (0.14.5)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
346-346: Do not catch blind exception: Exception
(BLE001)
flashinfer/api_logging.py
161-161: Do not catch blind exception: Exception
(BLE001)
225-225: Use explicit conversion flag
Replace with conversion flag
(RUF010)
236-236: Use explicit conversion flag
Replace with conversion flag
(RUF010)
240-240: Use explicit conversion flag
Replace with conversion flag
(RUF010)
241-241: Do not catch blind exception: Exception
(BLE001)
284-284: Consider moving this statement to an else block
(TRY300)
285-285: Do not catch blind exception: Exception
(BLE001)
345-345: Unused function argument: func_name
(ARG001)
369-369: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
430-431: try-except-pass detected, consider logging the exception
(S110)
430-430: Do not catch blind exception: Exception
(BLE001)
441-441: Do not catch blind exception: Exception
(BLE001)
442-442: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
452-452: Do not catch blind exception: Exception
(BLE001)
453-453: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (36)
flashinfer/cudnn/prefill.py (1)
6-6: Decorator wiring for cuDNN prefill entrypoint looks correctImporting
flashinfer_api_logand wrappingcudnn_batch_prefill_with_kv_cacheis consistent with the rest of the API surface; it doesn’t touch the cudnn-graph/jit internals and preserves function signature and control flow. Logging remains opt‑in via env level.Also applies to: 387-543
flashinfer/mla.py (1)
22-22: MLA wrapper instrumentation is aligned with the logging designDecorating
BatchMLAPagedAttentionWrapper.__init__,plan, andrunwithflashinfer_api_logcleanly instruments the main public lifecycle without altering behavior. The class name pattern ensures logs are emitted asBatchMLAPagedAttentionWrapper.<method>, which is useful for traceability, and zero‑overhead semantics hold when logging is disabled at import time.Also applies to: 133-203, 204-305, 337-451
flashinfer/cudnn/decode.py (1)
6-6: cuDNN decode entrypoint is correctly wrapped for loggingThe
flashinfer_api_logimport and decorator oncudnn_batch_decode_with_kv_cachematch the prefill pattern, leave the cuDNN graph construction untouched, and preserve the API contract. This is a safe, minimal instrumentation point.Also applies to: 256-350
tests/utils/test_logging.py (1)
43-82: Comprehensive logging decorator tests look solidThe autouse fixture plus
setup_loggingpattern gives each test a clean environment and forcesflashinfer.api_loggingto re‑read env vars, which matches how the decorator is intended to be configured. The suite covers level semantics, enums, defaults vs explicit kwargs, crash‑safety, CUDA tensors (incl. graph capture), class methods, and multiple invocation patterns; all expectations are consistent with the current implementation inflashinfer/api_logging.py.Also applies to: 83-585
flashinfer/api_logging.py (2)
70-163: Value formatting and CUDA‑graph‑aware tensor stats look well thought out
_format_value’s handling of enums, tensors (with shape/stride/device metadata), FP4Tensor, and nested containers is consistent and log‑friendly. The level‑3 path correctly avoids statistics during CUDA graph capture viatorch.cuda.is_current_stream_capturingand gracefully degrades to a “[statistics error: …]” line on failures. This matches the intended “rich when needed, safe when it fails” design.
369-462: Decorator implementation aligns with zero‑overhead and crash‑safety requirements
flashinfer_api_logcorrectly returns the original function when_API_LOG_LEVEL == 0at import/decorator time, avoiding any wrapper call‑overhead in the disabled case. For enabled levels, the wrapper’s pre‑call logging (with class‑qualified names for*Wrapperclasses) and post‑call logging, both protected by broadtry/exceptblocks, achieve the “crash‑safe, best‑effort logging” goal without risking user code execution. The decorator also supports both@flashinfer_api_logand@flashinfer_api_log()usage, which matches the tests.flashinfer/prefill.py (11)
25-25: API logger import is correctly scoped and consistent with other modulesThe relative import of
flashinfer_api_logmirrors how other utilities are imported and avoids circular dependencies; no issues here.
877-910: Decoratingsingle_prefill_with_kv_cache_with_jit_moduleis safe and aligns with the logging designApplying
@flashinfer_api_loghere cleanly instruments this top-level helper: it only wraps the JIT module’sruncall and does not interfere with caching or custom-op registration elsewhere.If you expect this function to be called inside
torch.compileor CUDA graphs, please sanity‑check that logging levels ≥2 are disabled in those contexts to avoid unnecessary extra kernels for statistics.
912-985: Loggingsingle_prefill_with_kv_cachecovers the primary single-request prefill API (including the alias)Instrumenting
single_prefill_with_kv_cacheis appropriate: it is the main single-prefill entry point, and the aliassingle_prefill_with_kv_cache_return_lse(viafunctools.partial) will also be logged because it refers to the decorated function.Please confirm that your log level defaults (e.g., env
FLASHINFER_APILOG_LEVEL) are set so that heavy workloads don’t accidentally incur level‑3 tensor statistics on this hot path in production.
1331-1498: Wrapper__init__logging forBatchPrefillWithPagedKVCacheWrapperlooks reasonableAdding
@flashinfer_api_logon the constructor is useful for debugging misconfigurations (workspace size, CUDA‑graph buffers, backend selection) and doesn’t affect runtime behavior, since allocations and checks were already here.It may be worth running a small CUDA‑graph flow that constructs this wrapper in graph-enabled mode to ensure that level‑2/3 logging remains acceptable during initialization (even if execution itself happens inside graphs).
1527-1925: LoggingBatchPrefillWithPagedKVCacheWrapper.planis helpful; watch for overhead at high log levelsDecorating
planaligns with the goal of capturing problem setup (indptrs, masks, backends) and should not affect correctness, since the body is unchanged and the decorator is pure Python around it.Because
plancopies indptrs to CPU and computes derived arrays, logging at level‑3 (with statistics) on very large batches can add noticeable overhead. Ensure that in tight autotuning or repeated planning loops you keepFLASHINFER_APILOG_LEVELat 0–1 unless you explicitly need detailed introspection.
1984-2220: InstrumentingBatchPrefillWithPagedKVCacheWrapper.runis consistent and appears safeApplying
@flashinfer_api_logtorungives visibility into the main prefill execution path (including PDL, backend selection, and output tensor shapes) while leaving the core kernel launch logic unchanged. The aliasrun_return_lsealso goes through the decorated method, so both result variants are logged.For very latency‑sensitive use (e.g., high‑QPS decode), you may want to benchmark with
FLASHINFER_APILOG_LEVEL=1/2/3to validate that the added logging—especially level‑3 statistics—meets your overhead budget.
2359-2474: Constructor logging forBatchPrefillWithRaggedKVCacheWrappermatches the paged wrapper behaviorDecorating
__init__here mirrors the paged wrapper: creation of workspace buffers, CUDA‑graph buffers, and backend selection are now log-visible without touching the execution logic.As with the paged wrapper, it’s worth verifying that constructing this wrapper inside any higher‑level tooling (e.g., model factories) remains acceptable when logging is enabled, since constructor logs can be noisy if wrappers are created per‑request.
2503-2794: LoggingBatchPrefillWithRaggedKVCacheWrapper.planis appropriate and symmetric with paged prefillThe
@flashinfer_api_logdecorator onplanfor ragged KV behaves analogously to the paged variant, exposing shapes, dtypes, and backend choices. The computational path (host transfers, planning, cached module setup) remains unchanged.Same suggestion as for the paged
plan: if you run many smallplancalls (e.g., in tuning or dynamic workloads), keep logging levels conservative to avoid overhead from repeated detailed summaries.
2848-2995:BatchPrefillWithRaggedKVCacheWrapper.runlogging cleanly wraps the main ragged prefill executionInstrumenting
runmakes sense: it’s the hot execution entry point and logs will now include mask mode, backend, and output shapes, which are valuable for debugging MoE / ragged scenarios. No functional changes are introduced.Consider adding this method to any logging tests you already have (similar to the decode wrappers) to confirm that level‑2/3 input summaries behave as expected with ragged indptr layouts.
3205-3337:trtllm_ragged_attention_deepseekis a good candidate for API loggingThe decorator around this Triton/TRT‑LLM‑style ragged attention wrapper will expose sequence length, scaling, and PDL settings without affecting the underlying kernel invocation. Given this is a specialized path, logging is particularly useful here.
Because this function is often used in brittle integration scenarios, you may want to add a small smoke test that calls it with logging level 2–3 to ensure tensor summaries do not assume contiguous layouts beyond what the implementation already guarantees.
3340-3553: Loggingtrtllm_batch_context_with_kv_cacheis consistent with other TRTLLM interfacesAdding
@flashinfer_api_loghere gives observability into FP4/FP8 output configuration, kv layout, and PDL usage. Since the decorator is a thin wrapper and all runtime behavior (including FP4 tensor handling) is unchanged, this looks safe.One thing to double‑check is that when
outis anFP4Tensor, the logging layer’s input summarization doesn’t accidentally try to introspect internal fields in a way that could be overly verbose; if so, you might consider teaching the logger a lightweight summary forFP4Tensor.flashinfer/gemm/gemm_base.py (9)
25-25: Importingflashinfer_api_loghere is consistent with other modulesThe relative import from
..api_loggingis correct for this package layout and keeps GEMM logging centralized alongside other instrumented APIs.
914-1084: SegmentGEMMWrapper.run instrumentation looks correct and non-intrusiveDecorating
SegmentGEMMWrapper.runadds valuable logging around a complex segmented GEMM path (seg_lens/seg_indptr, backend choice, shapes) without altering the kernel launch logic. Theforward = runalias will also be logged since it resolves to the decorated method.Given this is a potentially hot path in MoE workloads, you may want to benchmark with logging enabled at levels 2–3 to measure overhead from argument summarization on large
x/weights.
1573-1682:mm_fp8logging is well-placed on this primary FP8 GEMM entry pointThe decorator wraps only the public Python API; core work is still delegated to
trtllm_low_latency_gemm. No interactions with autotune or backend selection are altered.Since
mm_fp8can be called in tight loops, please confirm via the newbench_logging_overhead.pybenchmark that your target usage still meets latency goals at different logging levels.
2192-2356: Logginggemm_fp8_nt_groupwiseis appropriate; be mindful of level‑3 cost on large tilesThis function is a central entry point for FP8 groupwise GEMM on Blackwell; adding
@flashinfer_api_logis aligned with the overall API observability goal. The underlying CUTLASS/TRTLLM calls and device checks are unchanged.Because this path typically operates on large (M,N,K) blocks and may be used in performance‑critical contexts, it’s worth evaluating level‑3 logging separately—tensor statistics over large matrices can add non‑trivial overhead even if the core compute kernels are unchanged.
2531-2690:group_gemm_fp8_nt_groupwiselogging is consistent with other group GEMM entry pointsApplying
@flashinfer_api_loghere exposes group sizes, scale layouts, and hardware constraints for grouped FP8 GEMM. The function’s internal device checks and workspace handling are unchanged.Since this path asserts several alignment constraints (e.g.,
n % 8 == 0,k % 16 == 0), logging invalid calls at level ≥1 should be useful in diagnosing misconfigurations—ensure your logging level in debug runs is at least 1 to capture these.
2693-2825: Logginggroup_gemm_mxfp8_mxfp4_nt_groupwiseis appropriate for this specialized MXFP4 pathThe decorator wraps a fairly specialized Blackwell-only path that mixes FP8 activations with MXFP4 weights. Logging call shapes, tile sizes, and
swap_abwill be valuable in debugging, and the underlying SM100 module calls are preserved.Given the stricter assertions (tile sizes, alignment, dtype constraints), consider adding a small unit test that calls this function with logging enabled to ensure the logger handles packed MXFP4 tensors (
uint8) and scale tensors without excessive verbosity.
2861-2989:group_deepgemm_fp8_nt_groupwiselogging makes DeepGEMM MoE flows more observableInstrumenting this DeepGEMM-based grouped FP8 GEMM call will surface group assignments (
m_indices), scale granularities, and output sizes without altering the call tom_grouped_fp8_gemm_nt_contiguous. The architecture check remains in place.Because this function is likely used in MoE expert-routing paths, you may want to validate that logging large
m_indicestensors at level ≥2 is summarized compactly enough (e.g., only shape/type) to keep logs readable.
2992-3125:batch_deepgemm_fp8_nt_groupwiselogging mirrors the grouped DeepGEMM pathThe
@flashinfer_api_logdecorator here complementsgroup_deepgemm_fp8_nt_groupwise, covering the batched FP8 DeepGEMM case. The masking logic (masked_m,expected_m) and backend call remain unchanged.As with the grouped variant, confirm that logging of
masked_mandexpected_mbehaves as expected in your tests—these values are key for understanding performance characteristics and correctness in partially‑filled batches.
568-641: Remove SM110 from the docstring—the underlying C++ implementation only supports SM100 and SM103The runtime check is correct. Analysis of the C++ source reveals:
csrc/tgv_gemm.cuincludescutlass/gemm/collective/builders/sm100_common.inland uses SM100-specifictcgen05.mmainstructionsgen_tgv_gemm_sm10x_moduleis documented as "Generate TGV GEMM module for SM100 architecture"- No SM110 support path exists in the TGV GEMM implementation
- The test file correctly restricts to
["100", "103"]The docstring at lines 573–580 incorrectly claims SM110 support. It should be corrected to:
Requires SM100, SM103 architectureAdding SM110 to the runtime check as suggested would enable an unsupported architecture.
Likely an incorrect or invalid review comment.
flashinfer/decode.py (10)
24-24: LGTM - API logging import added.The import is correctly placed and follows the project's import organization.
316-346: API logging decorator correctly applied.The decorator is properly applied to this public API function. As a POC, this demonstrates the instrumentation approach. Before production use, verify that the logging overhead at levels 1-3 is acceptable for your performance requirements.
393-410: Correct decorator placement on overloaded function.The decorator is correctly applied to the implementation function rather than the overload signatures. This is the proper pattern for decorated overloaded functions.
652-664: API logging decorator correctly applied to constructor.The decorator is properly applied to the class constructor. The decorator implementation correctly detects and includes the class name in logs.
816-840: API logging decorator correctly applied to plan method.The decorator is properly applied. Note that at log level 2+, this method will generate verbose logs due to the many parameters. This is expected behavior for API logging and helps with debugging.
1170-1186: Correct decorator placement on overloaded method.The decorator is correctly applied to the implementation method rather than the overload signatures. The decorator will properly capture the
*argsparameter for logging.
2068-2089: API logging decorator correctly applied.The decorator is properly applied to this public API function. The extensive parameters and complex logic make this a good candidate for API logging to aid debugging.
2347-2364: API logging decorator correctly applied.The decorator is properly applied to this public API function.
2539-2556: API logging decorator correctly applied.The decorator is properly applied to this MLA-specific decode function.
2707-2724: API logging decorator correctly applied.The decorator is properly applied to this function. Overall, the API logging instrumentation in this file is consistent and well-implemented.
POC Assessment: The decorator applications throughout this file demonstrate a clean, non-invasive approach to API logging. The implementation maintains zero overhead when disabled and preserves function signatures and behavior. Before moving beyond POC, consider:
- Verify benchmark results show acceptable overhead at log levels 1-3
- Consider adding configuration to selectively enable/disable logging for specific functions
- Document the logging format and levels for end users
Based on the PR description, benchmarks exist at
benchmarks/bench_logging_overhead.py. Please verify the benchmark results demonstrate acceptable performance characteristics at different log levels, particularly for hot-path functions likeBatchDecodeWithPagedKVCacheWrapper.run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
flashinfer/api_logging.py (3)
338-362: Narrow_get_default_paramsexception handling to the documented cases
inspect.signatureis documented to raiseTypeError/ValueErroron failure; catching everyExceptionhere is broader than necessary and makes debugging real failures harder. The earlier review comment already pointed this out.You can keep the fail‑safe behavior but tighten the handler:
- except Exception: - # If we can't inspect the signature, return empty dict - return {} + except (TypeError, ValueError): + # If we can't inspect the signature, return empty dict + return {}This still preserves the “logging must not crash user code” guarantee.
493-528: Simplify class-name check and improve logging of internal logging errorsTwo small points around the decorator:
The condition
if "Wrapper" in class_name or class_name in ["BatchMLAPagedAttentionWrapper"]:is redundant because"BatchMLAPagedAttentionWrapper"already matches the"Wrapper"substring. This was noted in a previous review.In the pre/post logging
tryblocks you catchExceptionand log via_logger.error(...), which drops the traceback. Given this is a debugging/logger subsystem, having the traceback vialogging.exceptionis usually more helpful and you’re already isolating it from user code.Possible refactor:
- class_name = args[0].__class__.__name__ - if "Wrapper" in class_name or class_name in [ - "BatchMLAPagedAttentionWrapper" - ]: - func_name = f"{class_name}.{func_name}" + class_name = args[0].__class__.__name__ + if "Wrapper" in class_name: + func_name = f"{class_name}.{func_name}" except Exception: pass @@ - try: + try: if _API_LOG_LEVEL == 1: @@ - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}") + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (pre-execution)]", func_name + ) @@ - try: + try: if _API_LOG_LEVEL >= 2: @@ - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}") + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (outputs)]", func_name + )This keeps user‑visible behavior the same while making logger failures easier to diagnose.
29-31: MakeFLASHINFER_APILOG_LEVELparsing robust to invalid env valuesRight now
int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0"))will raiseValueError(and break import) if the env var is set to a non‑integer string. It’s safer to treat invalid values as “0” (disabled) so logging config can never prevent FlashInfer from importing.You can keep the behavior while hardening parsing like this:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) +try: + _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_APILOG_LEVEL", "0")) +except (TypeError, ValueError): + # Invalid user input; fall back to level 0 (disabled) + _API_LOG_LEVEL = 0Please rerun the existing logging tests (and any import-time tests) after this change.
🧹 Nitpick comments (2)
flashinfer/api_logging.py (2)
145-304: Optional: guard against recursive structures in_format_value
_format_valuewalks lists/tuples/dicts recursively and will recurse indefinitely on cyclic structures (e.g., a list containing itself), which can appear in complex call graphs. In a debugging logger this is usually rare but when it happens it’ll raiseRecursionErrorfrom the logging path.If you want to harden this, consider adding a
seen: set[int]parameter (or a max-depth cut‑off) and short‑circuit whenid(value)is already inseen. No need to change now if your call sites never pass cyclic objects, but it’s something to keep in mind.
420-441: Usefunc_namein_log_function_outputs(or drop it) to fix the unused arg and improve logs
func_nameis currently unused in_log_function_outputs, which triggers the linter and also makes the output block a bit less self‑describing when skimming logs.A small tweak makes the output clearer and fixes
ARG001:- # Log outputs - lines.append("Output value:") + # Log outputs + lines.append(f"Output value for {func_name}:")This keeps the existing separators and should be low‑risk, but please re-run the logging tests because they may assert on exact strings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/api_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
🪛 Ruff (0.14.5)
flashinfer/api_logging.py
86-86: Do not catch blind exception: Exception
(BLE001)
106-106: Do not catch blind exception: Exception
(BLE001)
124-124: Do not catch blind exception: Exception
(BLE001)
132-132: Do not catch blind exception: Exception
(BLE001)
236-236: Do not catch blind exception: Exception
(BLE001)
300-300: Use explicit conversion flag
Replace with conversion flag
(RUF010)
311-311: Use explicit conversion flag
Replace with conversion flag
(RUF010)
315-315: Use explicit conversion flag
Replace with conversion flag
(RUF010)
316-316: Do not catch blind exception: Exception
(BLE001)
359-359: Consider moving this statement to an else block
(TRY300)
360-360: Do not catch blind exception: Exception
(BLE001)
420-420: Unused function argument: func_name
(ARG001)
444-444: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
505-506: try-except-pass detected, consider logging the exception
(S110)
505-505: Do not catch blind exception: Exception
(BLE001)
516-516: Do not catch blind exception: Exception
(BLE001)
517-517: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
527-527: Do not catch blind exception: Exception
(BLE001)
528-528: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
| def _setup_logger(): | ||
| """Set up the logger based on environment variables.""" | ||
| if _API_LOG_LEVEL == 0: | ||
| # Completely disable logging for zero overhead | ||
| _logger.addHandler(logging.NullHandler()) | ||
| _logger.setLevel(logging.CRITICAL + 1) # Higher than any level | ||
| return | ||
|
|
||
| # All enabled levels use loggging.DEBUG; verbosity is controlled by FLASHINFER_APILOG_LEVEL instead | ||
| _logger.setLevel(logging.DEBUG) | ||
|
|
||
| # Remove any existing handlers | ||
| _logger.handlers.clear() | ||
|
|
||
| # Create handler based on destination | ||
| if _API_LOG_DEST == "stdout": | ||
| handler = logging.StreamHandler(sys.stdout) | ||
| elif _API_LOG_DEST == "stderr": | ||
| handler = logging.StreamHandler(sys.stderr) | ||
| else: | ||
| handler = logging.FileHandler(_API_LOG_DEST, mode="a") | ||
|
|
||
| # Use a simple formatter (we'll format the detailed content ourselves) | ||
| formatter = logging.Formatter("%(message)s") | ||
| handler.setFormatter(formatter) | ||
|
|
||
| _logger.addHandler(handler) | ||
| _logger.propagate = False # Don't propagate to root logger | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid import-time failures when FLASHINFER_APILOG_DEST is invalid
logging.FileHandler(_API_LOG_DEST, mode="a") will raise (e.g., bad path, permission) at import time and prevent FlashInfer from loading when logging is enabled. For a “debugging only” facility this is a bit too fragile; better to degrade to stderr (or a NullHandler) instead of failing imports.
A minimal way to harden this while keeping semantics:
- # Create handler based on destination
- if _API_LOG_DEST == "stdout":
- handler = logging.StreamHandler(sys.stdout)
- elif _API_LOG_DEST == "stderr":
- handler = logging.StreamHandler(sys.stderr)
- else:
- handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+ # Create handler based on destination; never let a bad path break import
+ try:
+ if _API_LOG_DEST == "stdout":
+ handler = logging.StreamHandler(sys.stdout)
+ elif _API_LOG_DEST == "stderr":
+ handler = logging.StreamHandler(sys.stderr)
+ else:
+ handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+ except OSError as exc:
+ # Fall back to stderr so the rest of the library remains usable
+ handler = logging.StreamHandler(sys.stderr)
+ _logger.warning(
+ "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr",
+ _API_LOG_DEST,
+ exc,
+ )Afterward, it’d be good to verify behavior by setting FLASHINFER_APILOG_DEST to an unwritable path and importing flashinfer.api_logging in a small repro.
🏁 Script executed:
#!/bin/bash
# Find and examine the full flashinfer/api_logging.py file
fd -t f "api_logging.py" --exec cat {} \;Length of output: 19059
🏁 Script executed:
#!/bin/bash
# Check when _setup_logger() is called (module level vs function level)
rg -n "_setup_logger" -A2 -B2Length of output: 462
🏁 Script executed:
#!/bin/bash
# Search for any existing error handling around logging setup
rg -n "except|try|OSError" flashinfer/api_logging.pyLength of output: 594
Add error handling to prevent import-time failures from invalid logging paths
The current code will break module imports if FLASHINFER_APILOG_DEST points to an invalid or unwritable path. Since _setup_logger() is called at module load time (line 67), any OSError from logging.FileHandler() will prevent FlashInfer from loading entirely. For a debugging facility, this is too fragile.
Add error handling to gracefully degrade to stderr:
# Create handler based on destination; never let a bad path break import
+ try:
- # Create handler based on destination
- if _API_LOG_DEST == "stdout":
+ if _API_LOG_DEST == "stdout":
handler = logging.StreamHandler(sys.stdout)
- elif _API_LOG_DEST == "stderr":
+ elif _API_LOG_DEST == "stderr":
handler = logging.StreamHandler(sys.stderr)
- else:
+ else:
handler = logging.FileHandler(_API_LOG_DEST, mode="a")
+ except OSError as exc:
+ # Fall back to stderr so the rest of the library remains usable
+ handler = logging.StreamHandler(sys.stderr)
+ _logger.warning(
+ "Failed to open FLASHINFER_APILOG_DEST=%r: %s; falling back to stderr",
+ _API_LOG_DEST,
+ exc,
+ )🤖 Prompt for AI Agents
In flashinfer/api_logging.py around lines 36 to 64, creating a FileHandler from
an invalid or unwritable FLASHINFER_APILOG_DEST can raise OSError at import time
and crash the module; wrap the handler creation in a try/except that catches
OSError (and IOException/OSError subclasses), and on exception fall back to a
logging.StreamHandler(sys.stderr) so the module still loads; also emit a
non-raising warning message via the selected handler or a temporary NullHandler
so the failure is recorded without propagating the exception, and ensure the
rest of the logger setup (formatter, addHandler, propagate) runs regardless of
whether the file handler succeeded.
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great feature to have, and thanks @bkryu for brining this up! One general question, do you think we should log kernel behavior at python side or C++ side?
Thanks @yzh119. This is a question I considered, more from a "should we log what happens inside each function call because the current design only logs inputs and outputs" point of view. My initial sentiment is that logging kernel behavior will be tricky because we have multiple kernel providers -- some of which are closed source and others are OSS, and some that already come with logging:
|
|
@xslingcn @Ubospica @YiyanZhai @zanderjiang this PR should greatly reducing the overhead of extracting workloads for flashinfer-bench project. |
|
@yzh119 Thanks for sharing! The low-overhead design looks very useful. We definitely have plans to support low-overhead dump and CUDA graph support in our roadmap. I will leverage this PR as a key reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
flashinfer/api_logging.py (2)
43-47: MakeFLASHINFER_LOGLEVEL_DBGparsing robust to invalid valuesAs written,
int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))will raiseValueErrorat import time if a user sets the env var to a non‑integer (e.g."foo"), preventingflashinfer.api_logging(and any API importing it) from loading. For a debug‑only facility this is too fragile; it should gracefully fall back to level 0.You can harden this by catching parse errors and defaulting to 0 (optionally emitting a warning to stderr):
-# Read environment variables once at module load time -_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) -_API_LOG_DEST = _substitute_process_id( - os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout") -) +# Read environment variables once at module load time +_raw_log_level = os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0") +try: + _API_LOG_LEVEL = int(_raw_log_level) +except (TypeError, ValueError): + # Fall back to level 0 on invalid input instead of failing import + _API_LOG_LEVEL = 0 + print( + f"[flashinfer.api_logging] Invalid FLASHINFER_LOGLEVEL_DBG={_raw_log_level!r}, " + "falling back to 0", + file=sys.stderr, + ) +_API_LOG_DEST = _substitute_process_id( + os.environ.get("FLASHINFER_LOGDEST_DBG", "stdout") +)This keeps existing semantics for valid values while avoiding import‑time crashes on bad configuration.
85-112: Prevent import-time failures whenFLASHINFER_LOGDEST_DBGpoints to an invalid path
_setup_logger()callslogging.FileHandler(_API_LOG_DEST, mode="a")unguarded whenever_API_LOG_DESTis neither"stdout"nor"stderr". If the env var points to an invalid or unwritable location, this raisesOSErrorat import time and breaks all FlashInfer imports when logging is enabled.For a debugging feature, it’s safer to degrade to stderr instead of failing import. For example:
- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )This preserves current behavior for valid paths but avoids hard failures when the destination is misconfigured.
🧹 Nitpick comments (3)
tests/utils/test_logging.py (1)
227-228: Address RuffARG001unused-argument warnings in test helpers (optional)Several test helper functions intentionally ignore some parameters (
modeintest_default_parameters,x/yincrashing_function, and the various*_valparameters intest_different_data_types), which Ruff flags asARG001. If your CI runs Ruff on tests, you can silence these by either:
- Prefixing unused parameters with
_(e.g.,_mode,_int_val, …), or- Adding
# noqa: ARG001on the relevant function definitions.This keeps the tests behaviorally identical while satisfying the linter.
Also applies to: 322-323, 355-365
flashinfer/api_logging.py (2)
498-519:_log_function_outputs’sfunc_nameis unused — consider wiring it into the log line
_log_function_outputsacceptsfunc_namebut never uses it, which is slightly confusing and flagged by linters asARG001. You can either drop the parameter, or (more usefully) include it in the header line:- lines = [] - # Log outputs - lines.append("Output value:") + lines = [] + # Log outputs + lines.append(f"Output value: {func_name}")This keeps the existing
"Output value:"prefix (so current tests continue to pass) while making the per‑call output section self‑describing.
596-607: Improve diagnostics for logging failures withlogging.exceptionThe decorator intentionally guards pre‑ and post‑execution logging with broad
except Exceptionblocks to avoid interfering with the wrapped function. That makes sense, but usinglogging.error(..., %s)drops the traceback, which can make debugging logging issues tricky.You can keep the broad guard while emitting richer diagnostics via
logging.exception:- except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (pre-execution)]: {e}") + except Exception: + _logger.exception( + "[LOGGING ERROR in %s (pre-execution)]", func_name + ) @@ - except Exception as e: - _logger.error(f"[LOGGING ERROR in {func_name} (outputs)]: {e}") + except Exception: + _logger.exception("[LOGGING ERROR in %s (outputs)]", func_name)This preserves crash‑safety while giving you a stack trace when something goes wrong inside the logging path.
Also applies to: 613-618
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)tests/utils/test_logging.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_api_log(522-627)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_api_log(522-627)decorator(581-622)wrapper(583-620)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
🪛 Ruff (0.14.5)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
37-37: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
346-346: Do not catch blind exception: Exception
(BLE001)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
flashinfer/api_logging.py
142-142: Do not catch blind exception: Exception
(BLE001)
162-162: Do not catch blind exception: Exception
(BLE001)
180-180: Do not catch blind exception: Exception
(BLE001)
210-210: Do not catch blind exception: Exception
(BLE001)
314-314: Do not catch blind exception: Exception
(BLE001)
378-378: Use explicit conversion flag
Replace with conversion flag
(RUF010)
389-389: Use explicit conversion flag
Replace with conversion flag
(RUF010)
393-393: Use explicit conversion flag
Replace with conversion flag
(RUF010)
394-394: Do not catch blind exception: Exception
(BLE001)
437-437: Consider moving this statement to an else block
(TRY300)
438-438: Do not catch blind exception: Exception
(BLE001)
498-498: Unused function argument: func_name
(ARG001)
522-522: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
593-594: try-except-pass detected, consider logging the exception
(S110)
593-593: Do not catch blind exception: Exception
(BLE001)
606-606: Do not catch blind exception: Exception
(BLE001)
607-607: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
617-617: Do not catch blind exception: Exception
(BLE001)
618-618: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (2)
tests/utils/test_logging.py (1)
27-585: Comprehensive coverage of decorator behaviorThis test suite does a good job exercising the decorator across levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, tensor metadata, nested structures, and CUDA graph scenarios; I don’t see functional issues here.
benchmarks/bench_logging_overhead.py (1)
56-219: Benchmark structure and CUDA timing look solidThe undecorated vs decorated matmul comparison, CUDA synchronization, warmup phase, and summary statistics all look correct for measuring logging overhead.
Also applies to: 221-339
…ing level above 3 now automatically include cudnn and cublas API logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer/gemm/gemm_base.py (1)
568-640: Fix SM110 support check and consider basic shape validation intgv_gemm_sm100Two minor issues here:
- The docstring says “Requires SM100, SM103, or SM110 architecture” (Lines 593–595), but the runtime check only allows
"100"and"103":if not _match_sm_version(a.device, ["100", "103"]): raise ValueError("TGV GEMM requires SM100, SM103 architecture")If SM110 is actually supported by the TGV kernel (as the name
gen_tgv_gemm_sm10x_moduleand note about SM100f suggest), this should include"110"; otherwise the note in the docstring should be adjusted to avoid misleading users.
- There’s no explicit validation of tensor ranks/shapes or bias shape. Given the comment “Computes: A @ B + bias” and the expectation
(M, K) @ (K, N) + (N,), adding a fast check fora.ndim == 2,b.ndim == 2,bias.ndim == 1,a.shape[1] == b.shape[0], andbias.shape[0] == b.shape[1]would fail early with clear error messages instead of deferring to the underlying kernel.Both changes would make this wrapper more robust without affecting the autotuning flow.
♻️ Duplicate comments (2)
flashinfer/api_logging.py (1)
28-45: Harden env parsing and handler setup to avoid import-time failuresRight now, two failure modes in this module can prevent
flashinferfrom importing at all:
- Line 42:
int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0"))raisesValueError/TypeErrorif the env var is non-numeric (e.g."foo"), aborting import.- Lines 65–71:
logging.FileHandler(_API_LOG_DEST, mode="a")will raiseOSErroron invalid/unwritable paths, also at import time.For a debug-only feature, this is too fragile; misconfigured env vars should disable logging or fall back safely, not make the library unusable.
A minimal hardening would be:
-_API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) +try: + _API_LOG_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) +except (TypeError, ValueError): + # Fall back to disabled logging on invalid level + _API_LOG_LEVEL = 0and in
_setup_logger:- # Create handler based on destination - if _API_LOG_DEST == "stdout": - handler = logging.StreamHandler(sys.stdout) - elif _API_LOG_DEST == "stderr": - handler = logging.StreamHandler(sys.stderr) - else: - handler = logging.FileHandler(_API_LOG_DEST, mode="a") + # Create handler based on destination; never let a bad path break import + try: + if _API_LOG_DEST == "stdout": + handler = logging.StreamHandler(sys.stdout) + elif _API_LOG_DEST == "stderr": + handler = logging.StreamHandler(sys.stderr) + else: + handler = logging.FileHandler(_API_LOG_DEST, mode="a") + except OSError as exc: + # Fall back to stderr so the rest of the library remains usable + handler = logging.StreamHandler(sys.stderr) + _logger.warning( + "Failed to open FLASHINFER_LOGDEST_DBG=%r: %s; falling back to stderr", + _API_LOG_DEST, + exc, + )This keeps existing semantics in the happy path but makes misconfiguration non-fatal.
Also applies to: 51-82
benchmarks/bench_logging_overhead.py (1)
30-36: Align benchmark’sLOG_DESTwith the decorator’s actual destinationHere
LOG_DESTdefaults to/tmp/flashinfer_benchmark_log.txtwhenFLASHINFER_LOGDEST_DBGis unset, butflashinfer.api_loggingstill defaults its destination to"stdout". That means:
- The decorator logs to stdout.
- The benchmark later looks for a file at
LOG_DEST(Lines 248–251, 292–301) and usually finds nothing, so the “LOG FILE INFO” section never prints.To keep the benchmark’s view of the log destination consistent with the decorator, set the env var before importing
flashinfer_log:LOGGING_LEVEL = int(os.environ.get("FLASHINFER_LOGLEVEL_DBG", "0")) LOG_DEST = os.environ.get( "FLASHINFER_LOGDEST_DBG", "/tmp/flashinfer_benchmark_log.txt" ) +# Ensure the decorator sees the same destination default +os.environ.setdefault("FLASHINFER_LOGDEST_DBG", LOG_DEST) # Import the decorator from flashinfer.api_logging import flashinfer_logThis preserves the current default path but guarantees both components use the same destination.
Also applies to: 248-251, 292-301
🧹 Nitpick comments (2)
flashinfer/cudnn/prefill.py (1)
6-6: Decorator integration and new backend flags look safe; consider documentingbackend.
- Importing and applying
@flashinfer_logtocudnn_batch_prefill_with_kv_cacheis semantically safe because the decorator is a no-op at log level 0 and preserves the function signature.- Adding
is_cuda_graph_compatibleandbackendis backward‑compatible; they are defaulted and only change behavior when explicitly used (e.g.,backend=="cubin"forces the fmha‑gen path).- Minor: the docstring mentions
is_cuda_graph_compatiblebut notbackend. It may be worth adding a short description forbackendfor consistency and to clarify the"cubin"override behavior.Also applies to: 387-411
tests/utils/test_logging.py (1)
43-585: Logging tests are comprehensive; Ruff ARG001 hints are low-priority in this contextThis suite does a good job validating
flashinfer_logbehavior (levels 0/1/3/5, enums, defaults vs explicit args, kwargs, crash safety, CUDA graphs, nested structures, etc.) and the fixture correctly resets env state and reloads the module between tests. The RuffARG001warnings about unused parameters in test functions (e.g.,mode,int_val, etc.) are expected for this style of tests; if you care about a clean lint run you can prefix those parameters with_or reference them trivially, but functionally the current code is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
benchmarks/bench_logging_overhead.py(1 hunks)flashinfer/api_logging.py(1 hunks)flashinfer/cudnn/decode.py(2 hunks)flashinfer/cudnn/prefill.py(2 hunks)flashinfer/decode.py(10 hunks)flashinfer/fused_moe/core.py(7 hunks)flashinfer/gemm/gemm_base.py(12 hunks)flashinfer/mla.py(4 hunks)flashinfer/prefill.py(11 hunks)tests/utils/test_logging.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- flashinfer/mla.py
🧰 Additional context used
🧬 Code graph analysis (9)
flashinfer/cudnn/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
tests/utils/test_logging.py (1)
flashinfer/api_logging.py (3)
flashinfer_log(466-564)decorator(519-560)wrapper(521-558)
flashinfer/cudnn/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/fused_moe/core.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
benchmarks/bench_logging_overhead.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/decode.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/prefill.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
flashinfer/gemm/gemm_base.py (1)
flashinfer/api_logging.py (1)
flashinfer_log(466-564)
🪛 Ruff (0.14.5)
tests/utils/test_logging.py
227-227: Unused function argument: mode
(ARG001)
322-322: Unused function argument: x
(ARG001)
322-322: Unused function argument: y
(ARG001)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Unused function argument: int_val
(ARG001)
357-357: Unused function argument: float_val
(ARG001)
358-358: Unused function argument: bool_val
(ARG001)
359-359: Unused function argument: str_val
(ARG001)
360-360: Unused function argument: list_val
(ARG001)
361-361: Unused function argument: tuple_val
(ARG001)
362-362: Unused function argument: dict_val
(ARG001)
363-363: Unused function argument: none_val
(ARG001)
flashinfer/api_logging.py
108-108: Do not catch blind exception: Exception
(BLE001)
128-128: Do not catch blind exception: Exception
(BLE001)
146-146: Do not catch blind exception: Exception
(BLE001)
154-154: Do not catch blind exception: Exception
(BLE001)
258-258: Do not catch blind exception: Exception
(BLE001)
322-322: Use explicit conversion flag
Replace with conversion flag
(RUF010)
333-333: Use explicit conversion flag
Replace with conversion flag
(RUF010)
337-337: Use explicit conversion flag
Replace with conversion flag
(RUF010)
338-338: Do not catch blind exception: Exception
(BLE001)
381-381: Consider moving this statement to an else block
(TRY300)
382-382: Do not catch blind exception: Exception
(BLE001)
442-442: Unused function argument: func_name
(ARG001)
466-466: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
531-532: try-except-pass detected, consider logging the exception
(S110)
531-531: Do not catch blind exception: Exception
(BLE001)
544-544: Do not catch blind exception: Exception
(BLE001)
545-545: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
555-555: Do not catch blind exception: Exception
(BLE001)
556-556: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
benchmarks/bench_logging_overhead.py
1-1: Shebang is present but file is not executable
(EXE001)
32-32: Probable insecure usage of temporary file or directory: "/tmp/flashinfer_benchmark_log.txt"
(S108)
329-329: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (5)
flashinfer/decode.py (1)
24-24: API logging decoration across decode entry points looks correct and non-intrusive.
- Importing
flashinfer_logand applying it to all user‑facing decode APIs (single, batch, wrappers, and TRT‑LLM/XQA helpers) preserves signatures and overload contracts while adding centralized logging.- The decorator’s no‑op behavior at log level 0 avoids extra overhead in production, and
functools.wrapskeeps introspection andpartialmethod/aliasing (begin_forward,run_return_lse) working as before.- No behavioral changes to argument defaults, CUDA‑graph handling, or backend routing are introduced by these annotations.
Also applies to: 316-345, 393-410, 652-664, 816-840, 1170-1186, 2068-2089, 2342-2359, 2527-2544, 2689-2705
flashinfer/prefill.py (1)
25-25: Prefill API logging wiring is consistent and preserves existing behavior.
- Importing and applying
@flashinfer_logto single‑prefill helpers, both prefill wrappers (paged and ragged), and the TRT‑LLM prefill/context functions cleanly instruments all public prefill entry points without altering their logic.- Overloads remain untouched; only concrete implementations are wrapped, and
begin_forward/run_return_lsealiases continue to work through the decorated methods.- No changes to backend selection, CUDA‑graph handling, or tensor shape/device contracts are introduced in these hunks.
Also applies to: 877-888, 962-985, 1331-1346, 1527-1563, 1984-2000, 2359-2372, 2503-2531, 2848-2859, 3205-3227, 3340-3362
flashinfer/cudnn/decode.py (1)
6-6: cudnn decode logging and CUDA‑graph flag propagation look correct.
- Importing
flashinfer_logand decoratingcudnn_batch_decode_with_kv_cacheintegrates this entry point into the new logging system without touching its core logic.- The
is_cuda_graph_compatibleargument is defaulted and only matters in the fmha‑gen fallback path; CUDNN proper continues to behave as before.- Signature and docstring remain aligned, so existing callers are unaffected while gaining optional logging and an explicit CUDA‑graph compatibility hint.
Also applies to: 256-273
flashinfer/fused_moe/core.py (1)
23-23: API-level logging instrumentation for MoE wrappers looks correctImporting
flashinfer_logand decorating the public MoE entrypoints here cleanly extends logging without touching the underlying custom ops. Becauseflashinfer_logreturns the original function whenFLASHINFER_LOGLEVEL_DBG=0, these changes are behavior-preserving in the default/off configuration, and the wrappers’ signatures remain unchanged. No issues from the decorator stacking are apparent.Also applies to: 688-719, 1862-1881, 1943-1965, 2017-2040, 2095-2127, 2225-2257
flashinfer/gemm/gemm_base.py (1)
914-1083: Decorator-based logging on GEMM and segment-GEMM entrypoints looks consistentImporting
flashinfer_loghere and applying it toSegmentGEMMWrapper.run,tgv_gemm_sm100, and the various FP8/FP4 GEMM entrypoints (mm_fp8,mm_fp4,bmm_fp8,gemm_fp8_nt_groupwise, blockscaled/group variants, DeepGEMM group/batch wrappers) is consistent with the rest of the PR:
- Signatures and return types are unchanged.
- At
FLASHINFER_LOGLEVEL_DBG=0the decorator returns the original function, so hot paths (GEMM kernels) pay no additional cost when logging is disabled.- Where combined with
@backend_requirement, the backend checks still wrap the logged function cleanly.From a call-site perspective, this is a non-breaking way to surface API-level logs for GEMM operations.
Also applies to: 1573-1682, 1851-2002, 2101-2190, 2193-2356, 2502-2527, 2531-2690, 2694-2825, 2862-2989, 2993-3125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
LOGGING.md(1 hunks)README.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- README.md
🧰 Additional context used
🪛 GitHub Actions: pre-commit
LOGGING.md
[error] 1-1: pre-commit end-of-file-fixer hook failed. The hook modified LOGGING.md (added a newline at EOF).
🪛 LanguageTool
LOGGING.md
[grammar] ~24-~24: Ensure spelling is correct
Context: ...| Disabled (Default) | No logging (zero overhad) | Production | | 1 | Function Name...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (1)
LOGGING.md (1)
5-82: Well-structured documentation with clear guidance.The logging documentation is clearly organized with practical examples, environment variable reference, and real-world use cases (CUDA graph compatibility, multi-GPU process ID substitution). The Quick Start and logging levels table make it easy for users to get started.
📌 Description
WIP Do not merge
tl; dr: Current PR adds a logging system for input/output tracking to aid debugging FlashInfer APIs.
This PR introduces a production-ready API logging infrastructure that tracks function calls, arguments, and return values via a simple one-line decorator. Any function can be decorated with the decorator to track the input/output values in the API logger.
Key Features:
FLASHINFER_LOGLEVEL_DBGFLASHINFER_LOGDEST_DBG; defaults tostdoutbenchmarks/bench_logging_overhead.pyExample usage
produces log
export FLASHINFER_LOGLEVEL_DBG=3produces:export FLASHINFER_LOGLEVEL_DBG=5produces:🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
New Features
Tools
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.