feat(flexlb): batch scheduling infrastructure, strategy refactor, and error model hardening#1138
feat(flexlb): batch scheduling infrastructure, strategy refactor, and error model hardening#1138wzy-99 wants to merge 12 commits into
Conversation
AI Code Review - PR #1138Status: BLOCKING Summary: P0/1 · P1/0 · P2/1 · P3/0 Blocking IssuesP0
Non-blocking SuggestionsP2
Checklist Violations (3 fail / 56 total)General Principles Checklist
Python Static-First Checklist
Strengths
|
5dff119 to
e5fa216
Compare
AI Code Review - PR #1138Status: BLOCKING Summary: P0/1 · P1/0 · P2/1 · P3/1 Blocking IssuesP0
Non-blocking SuggestionsP2
P3
Checklist Violations (7 fail / 69 total)General Principles Checklist
Python Static-First Checklist
Strengths
|
AI Code Review - PR #1138Status: BLOCKING Summary: P0/1 · P1/0 · P2/2 · P3/0 Blocking IssuesP0
Non-blocking SuggestionsP2
Checklist Violations (9 fail / 88 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
AI Code Review - PR #1138Status: LGTM Summary: P0/0 · P1/0 · P2/1 · P3/1 lgtm ready to ci Non-blocking SuggestionsP2
P3
Checklist Violations (7 fail / 103 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
|
CI dispatcher could not find a native This can happen if the PR was opened before the CI architecture change, or if the original run was deleted. To fix: push any commit (even empty: |
AI Code Review - PR #1138Status: BLOCKING Summary: P0/0 · P1/3 · P2/6 · P3/2 Blocking IssuesP1
Non-blocking SuggestionsP2
P3
Checklist Violations (1 fail / 56 total)General Principles Checklist
Strengths
|
…ng, and PrefillRpcServer metrics Combined from: - cc55836 feat: sync flexlb batch async fetch - b422c6c feat(flexlb): batch scheduling, strategy refactor, error model hardening, and test coverage - 89adee8 feat(PrefillRpcServer): enhance thread pool management with worker lambda pool and metrics - 909cfe5 feat(util): add requests library import for HTTP functionality
AI Code Review - PR #1138Status: BLOCKING Summary: P0/0 · P1/11 · P2/16 · P3/2 Blocking IssuesP1
Non-blocking SuggestionsP2
P3
Checklist Violations (7 fail / 56 total)General Principles Checklist
RTP-LLM Checklist
Strengths
|
…e + 3 P1 fixes - P0: move consecutiveFailures from GrpcWorkerStatusRunner instance field to WorkerStatus object, so the counter persists across sync cycles instead of being reset every time a new Runner is created - P1: fix isNonBatchPath to use != BATCH instead of == DIRECT, so AUTO-mode requests that fail batch qualification also get inflight tracking - P1: add null guard for newWorkerStatus.getRole().name() to prevent NPE crashing the sync thread - P1: add str branch in generate_config.py validate_role for JSON round-trip
Add per-request INFO-level event logs across engine/scheduler/executor/ dispatcher layers for complete request traceability (prefill → decode → finish). All events carry trace_id via streamLogTag() formatted as 'trace_id=XXX req_id=YYY'. Changes: - GenerateStream.h: implement streamLogTag() inline with trace_id+req_id - FIFOScheduler.cc: add request_activated (waiting→running + loading→ running) and request_finished (cleanup) logs - NormalExecutor.cc: add trace_id to prefill_batch_begin; add new decode_step_begin event for decode batch visibility - NormalOutputDispatcher.cc: add first_token + decode_finished per-request events (compatible with dispatchOutputAsync path) - frontend_server.py: add request_arrival + request_completion logs Design: all events use RTP_LLM_ACCESS_LOG_INFO (no [RANK][file:line] prefix); INFO only at batch boundaries/state transitions; trace_id coverage on all per-request events.
P1 #10: ResourceMeasure 重构导致 RandomStrategy 资源检查失效 - DecodeResourceMeasure/PrefillResourceMeasure 添加 @OverRide isResourceAvailable(WorkerEndpoint) 桥接方法 - 通过 instanceof 委托到类型化方法,修复之前仅走接口 default 方法(只查 isAlive())的 bug P1 #8: activelyNotifyParticipants 通知条件反转 - 修正 localIp.equals 为 !localIp.equals,确保远程节点能收到 master 切换通知 P1 #9: waitForLeadershipTransfer 无退出上限 - 新增 MAX_WAIT_COUNT=30 最大等待保护,超时后 warn + 强制退出 P2 #3: applyTrafficPolicyOverride 缺 try-catch - JsonUtils.toObject 包裹 try-catch,防止异常 JSON 导致启动崩溃 P2 #12: LBStatusConsistencyService 静态线程池未关闭 - 添加 @PreDestroy shutdown() 关闭 SCHEDULED_EXECUTOR_SERVICE
…ia PDSepConfig - FlexlbConfig: flexlbBatchQueueMaxSize 64 → 1024 - PDSepConfig: add prefill_enqueue/worker_lambda/slot_pool_size fields (default 0 = formula) - PrefillRpcServer: initThreadPools() reads from pd_sep_config instead of hardcoded/env var - ConfigInit.cc: pybind registration + pickle for 3 new PDSepConfig fields - pd_separation_group_args.py: CLI args for 3 thread pool sizes
…n in .h) Commit f1df601 added an inline definition in GenerateStream.h but forgot to remove the existing out-of-line implementation in GenerateStream.cc, causing a redefinition error on cuda13_x86 builds.
- Add flexlbBatchFixedMaxInflightBatches config (default 0, disabled) to limit in-flight batches per prefill worker in fixed_window mode - Rename SLO batcher config flexlbBatchMaxInflightBatchesPerWorker -> flexlbBatchSloMaxInflightBatches for naming consistency - FixedWindowBatcherAlgorithm parks instead of dispatching when engine inflight batch count >= limit, preventing engine overload
AI Code Review - PR #1138Status: BLOCKING Summary: P0/0 · P1/11 · P2/44 · P3/0 Blocking IssuesP1
Non-blocking SuggestionsP2
Checklist ✅ (56 items passed)Strengths
|
…n multiple modules
AI Code Review - PR #1138Status: BLOCKING Summary: P0/0 · P1/12 · P2/22 · P3/3 Blocking IssuesP1
Non-blocking SuggestionsP2
P3
Checklist ✅ (56 items passed)Strengths
|
…ster routing Add MasterConfig.disable_domain_fallback field with env var MASTER_DISABLE_DOMAIN_FALLBACK to prevent fallback to VipServer domain routing when master is unavailable or not configured. When enabled, requests will fail with ROUTE_ERROR instead of silently degrading to domain-based service discovery.
AI Code Review - PR #1138Status: BLOCKING Summary: P0/1 · P1/15 · P2/31 · P3/2 Blocking IssuesP0
P1
Non-blocking SuggestionsP2
P3
Checklist Violations (6 fail / 56 total)General Principles Checklist
Strengths
|
… configurable default schedule mode
1. CostBasedPrefillStrategy.isNonBatchPath(): remove scheduleMode check
- When flexlbBatchEnabled=true, only check the global flag; schedule mode
is no longer a condition for creating request-ID placeholder entries.
- This eliminates ~93% redundant inflightBatches entries that were mixing
batch-ID and request-ID keys in the same ConcurrentHashMap.
2. PrefillEndpoint.calibrate(): log inflightBatches.size() instead of keySet()
- Avoid dumping 250+ IDs per calibrate log line when placeholder entries exist.
3. FlexlbConfig: add DEFAULT_SCHEDULE_MODE env var support
- New field defaultScheduleMode (String, default 'AUTO'), overridable via
DEFAULT_SCHEDULE_MODE env var (AUTO/BATCH/DIRECT).
4. FlexlbServiceImpl: use config-driven default schedule mode
- Replace hardcoded toScheduleMode() with resolveScheduleMode() that falls
back to config.getDefaultScheduleModeEnum() when proto sends default value.
Summary
This PR introduces FlexLB batch scheduling infrastructure with SLO-based admission control, strategy refactoring, and error model hardening.
Key Changes
Batch Scheduling Infrastructure
Strategy Refactor
Error Model Hardening (C++ engine)
Review Fixes (Round 3)
Test Coverage
Previous Reviews