添加训推修复功能(add feature train-infer-mismatch),更完整,更全面 #288
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
我进行了合并修订。具体来讲,对于新版本中有关infer-log-prob的获取部分我遵循了官方的版本。但是在具体的训推修复部分,是使用我自己的修订。
✨ What's Changed
What does this PR do?
✨ What's Changed
1. 核心组件重构
InferCorrectionHandler(roll/utils/infer_correction.py)类:专注处理IS校正+样本拒绝,替代原loss_func中混杂逻辑2. 三级拒绝策略体系
infer_token_mask_threshold_{min,max}enable_seq_reject,infer_seq_mask_threshold_{min,max}infer_catastrophic_threshold3. 智能重要性采样
token:传统token级IS(默认)sequence:序列总log-ratio(稳定长序列训练)geometric:几何平均比率(平衡极端值)none:关闭IS(基准测试用)4. 工业级诊断系统
StatsCollector集中管理指标,分三类:token_ratio_mean/std/min/maxtoken_reject_frac,seq_reject_frac,catastrophic_seq_fracinferkl(原始KL),inferkl_reject(拒绝后KL)