fix(c_api): include start_iteration in SingleRowPredictor cache key#7225
Open
JKDasondee wants to merge 1 commit intolightgbm-org:masterfrom
Open
fix(c_api): include start_iteration in SingleRowPredictor cache key#7225JKDasondee wants to merge 1 commit intolightgbm-org:masterfrom
JKDasondee wants to merge 1 commit intolightgbm-org:masterfrom
Conversation
SingleRowPredictorInner stored only num_iteration, not start_iteration, so consecutive calls to LGBM_BoosterPredictForMatSingleRow (and the sibling Fast/CSR/CSC single-row entry points that share the same cached predictor) with the same num_iteration but different start_iteration returned the stale predictions from the first call. The predictor cache in SetSingleRowPredictorInner never re-initialized because IsPredictorEqual did not see start_iteration as part of the key. Add start_iter_ to SingleRowPredictorInner, store it in the constructor, and require it to match in IsPredictorEqual. The call site in SetSingleRowPredictorInner already has start_iteration in scope, so the fix is a two-line change in c_api.cpp. Adds SingleRow.StartIterationChangesPrediction to tests/cpp_tests/ test_single_row.cpp. The test trains 30 iterations and calls LGBM_BoosterPredictForMatSingleRow twice on the same booster handle with num_iteration=10 and start_iteration=0 then start_iteration=20 (disjoint windows). Without the fix the second call hits the stale predictor cache and returns the score from the [0, 10) window. With the fix the second call rebuilds the predictor and returns the [20, 30) window score, which also matches the non-single-row LGBM_BoosterPredictForMat ground truth. Verified by stashing the fix, rebuilding testlightgbm, and confirming the new test fails with "0.65339 vs 0.65339" (stale cache) and "0.65339 vs 0.56967" (disagrees with LGBM_BoosterPredictForMat), then reapplying the fix and seeing all three SingleRow tests pass. Fixes lightgbm-org#7220
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #7220.
LGBM_BoosterPredictForMatSingleRow(and its Fast/CSR/CSC siblings that share the same cached predictor) returned stale predictions when called repeatedly on the same booster with differentstart_iterationvalues but the samenum_iterationandpredict_type.Root cause:
SingleRowPredictorInner::IsPredictorEqualatsrc/c_api.cpp:96omittedstart_iterationfrom the cache key, soSetSingleRowPredictorInner(line 434) never rebuilt the cached predictor when onlystart_iterationchanged.The fix
SingleRowPredictorInner(int predict_type, Boosting* boosting, const Config& config, int start_iter, int num_iter) { ... + start_iter_ = start_iter; iter_ = num_iter; ... } - bool IsPredictorEqual(const Config& config, int iter, Boosting* boosting) { + bool IsPredictorEqual(const Config& config, int start_iter, int iter, Boosting* boosting) { return early_stop_ == config.pred_early_stop && early_stop_freq_ == config.pred_early_stop_freq && early_stop_margin_ == config.pred_early_stop_margin && + start_iter_ == start_iter && iter_ == iter && num_total_model_ == boosting->NumberOfTotalModel(); } private: ... + int start_iter_; int iter_; int num_total_model_; };Caller update in
SetSingleRowPredictorInner(the only caller ofIsPredictorEqual):Test
Adds
SingleRow.StartIterationChangesPredictiontotests/cpp_tests/test_single_row.cpp. The test:binary.train.LGBM_BoosterPredictForMatSingleRowtwice on the same booster withnum_iteration=10,start_iteration=0thenstart_iteration=20(disjoint iteration windows).LGBM_BoosterPredictForMat(which does not use the buggy cache) to confirm the single-row path agrees with the non-single-row path for both iteration windows.TDD verification
I stashed the
c_api.cppfix, rebuilttestlightgbm, and re-ran just the new test. It failed exactly as the issue describes:I then reapplied the fix, rebuilt, and ran the full
SingleRow.*suite:Build:
cmake -S . -B build-test -DBUILD_CPP_TEST=ON -DUSE_OPENMP=ON -G "MinGW Makefiles" && cmake --build build-test --target testlightgbmon Windows + MinGW gcc 13.2.0.Impact
Any consumer that re-uses a Booster handle and calls any of the single-row predict entry points with varying
start_iterationhit this bug. This affects online-serving / low-latency inference patterns where a single booster is loaded once and reused across many per-request prediction calls with different slicing. Every language binding that calls through the C API (including SWIG-based ones) is affected.No API or ABI surface change —
IsPredictorEqualis a private class method; its only caller is updated in the same patch.