Feat/model inputs dump#1118
Open
JackTan25 wants to merge 2 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds an opt-in debug feature to dump model inputs into rotating log files to aid troubleshooting and profiling.
Changes:
- Introduces
enable_model_inputs_logconfig/CLI/env flag and propagates it through executors and gatherers. - Extends
GptModelInputswith extra tensors for logging snapshots and wires a newModelInputsLoggerintoPyWrappedModel::forward. - Implements
ModelInputsLoggerwith file rotation, periodic flush, and optional metrics reporting.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| rtp_llm/server/server_args/profile_debug_logging_group_args.py | Adds CLI/env switch to enable model input logging. |
| rtp_llm/cpp/config/ConfigModules.h | Adds enable_model_inputs_log to profiling debug config. |
| rtp_llm/cpp/config/ConfigModules.cc | Prints the new config field in to_string(). |
| rtp_llm/cpp/pybind/ConfigInit.cc | Exposes the new config field to Python + extends pickle state. |
| rtp_llm/models_py/bindings/core/OpData.h | Adds host snapshot tensors to GptModelInputs for logging. |
| rtp_llm/cpp/normal_engine/NormalModelInputGatherer.h | Adds gatherer flag to control host snapshots for logging. |
| rtp_llm/cpp/normal_engine/NormalModelInputGatherer.cc | Populates logging snapshot tensors when enabled. |
| rtp_llm/cpp/normal_engine/NormalBatchStreamProcessor.cc | Passes the new flag into the input gatherer config. |
| rtp_llm/cpp/normal_engine/NormalExecutor.h | Stores a shared ModelInputsLogger in the executor. |
| rtp_llm/cpp/normal_engine/NormalExecutor.cc | Conditionally constructs/injects ModelInputsLogger into model. |
| rtp_llm/cpp/normal_engine/speculative/MtpExecutor.h | Stores a shared ModelInputsLogger for speculative executor. |
| rtp_llm/cpp/normal_engine/speculative/MtpExecutor.cc | Conditionally constructs/injects ModelInputsLogger into models. |
| rtp_llm/cpp/models/PyWrappedModel.h | Adds logger dependency to constructor and stores it. |
| rtp_llm/cpp/models/PyWrappedModel.cc | Emits model inputs logs at start of forward(). |
| rtp_llm/cpp/models/ModelInputsLogger.h | Adds new logger class interface. |
| rtp_llm/cpp/models/ModelInputsLogger.cc | Implements JSONL logging + rotation + metrics. |
| rtp_llm/cpp/models/BUILD | Adds build deps needed by ModelInputsLogger. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+229
to
+240
| output_.open(file_path_, std::ios::out | std::ios::trunc); | ||
| bytes_ = 0; | ||
| } | ||
|
|
||
| std::mutex mutex_; | ||
| std::filesystem::path file_path_; | ||
| std::ofstream output_; | ||
| int backup_count_ = 0; | ||
| size_t bytes_ = 0; | ||
| size_t pending_lines_ = 0; | ||
| int64_t last_flush_us_ = 0; | ||
| bool valid_ = false; |
Comment on lines
+31
to
+56
| std::string jsonEscape(const std::string& input) { | ||
| std::ostringstream os; | ||
| for (unsigned char c : input) { | ||
| switch (c) { | ||
| case '\\': | ||
| os << "\\\\"; | ||
| break; | ||
| case '"': | ||
| os << "\\\""; | ||
| break; | ||
| case '\n': | ||
| os << "\\n"; | ||
| break; | ||
| case '\r': | ||
| os << "\\r"; | ||
| break; | ||
| case '\t': | ||
| os << "\\t"; | ||
| break; | ||
| default: | ||
| os << static_cast<char>(c); | ||
| break; | ||
| } | ||
| } | ||
| return os.str(); | ||
| } |
Comment on lines
+58
to
+65
| std::string combineStringsForLog(const std::vector<std::string>& vec) { | ||
| std::string result = "\" "; | ||
| for (const auto& s : vec) { | ||
| result += s + ", "; | ||
| } | ||
| result += "\""; | ||
| return result; | ||
| } |
Comment on lines
+99
to
+106
| profile_debug_logging_group.add_argument( | ||
| "--enable_model_inputs_log", | ||
| env_name="ENABLE_MODEL_INPUTS_LOG", | ||
| bind_to=(profiling_debug_config, "enable_model_inputs_log"), | ||
| type=str2bool, | ||
| default=False, | ||
| help="控制是否打印模型输入日志。可选值: True (启用), False (禁用)。默认为 False", | ||
| ) |
Comment on lines
+42
to
+45
| torch::Tensor combo_tokens_host_for_log; | ||
| torch::Tensor input_lengths_host_for_log; | ||
| torch::Tensor sequence_lengths_host_for_log; | ||
| torch::Tensor prefix_lengths_host_for_log; |
6eab357 to
c8deafc
Compare
Collaborator
AI Code Review - PR #1118Status: LGTM Summary: P0/0 · P1/0 · P2/3 · P3/0 lgtm ready to ci Non-blocking SuggestionsP2
Checklist Violations (6 fail / 78 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
LLLLKKKK
requested changes
Jun 22, 2026
| return pos == std::string::npos ? tensor_with_data : tensor_with_data.substr(pos + 2); | ||
| } | ||
|
|
||
| std::string tensorLogString(const torch::Tensor& tensor, const torch::Tensor& host_snapshot = {}) { |
| @@ -0,0 +1,271 @@ | |||
| #include "rtp_llm/cpp/models/ModelInputsLogger.h" | |||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.