Skip to content

fix(c_api): include start_iteration in SingleRowPredictor cache key#7225

Open
JKDasondee wants to merge 1 commit intolightgbm-org:masterfrom
JKDasondee:fix/7220-single-row-start-iteration
Open

fix(c_api): include start_iteration in SingleRowPredictor cache key#7225
JKDasondee wants to merge 1 commit intolightgbm-org:masterfrom
JKDasondee:fix/7220-single-row-start-iteration

Conversation

@JKDasondee
Copy link
Copy Markdown

Summary

Fixes #7220. LGBM_BoosterPredictForMatSingleRow (and its Fast/CSR/CSC siblings that share the same cached predictor) returned stale predictions when called repeatedly on the same booster with different start_iteration values but the same num_iteration and predict_type.

Root cause: SingleRowPredictorInner::IsPredictorEqual at src/c_api.cpp:96 omitted start_iteration from the cache key, so SetSingleRowPredictorInner (line 434) never rebuilt the cached predictor when only start_iteration changed.

The fix

   SingleRowPredictorInner(int predict_type, Boosting* boosting, const Config& config, int start_iter, int num_iter) {
     ...
+    start_iter_ = start_iter;
     iter_ = num_iter;
     ...
   }

-  bool IsPredictorEqual(const Config& config, int iter, Boosting* boosting) {
+  bool IsPredictorEqual(const Config& config, int start_iter, int iter, Boosting* boosting) {
     return early_stop_ == config.pred_early_stop &&
       early_stop_freq_ == config.pred_early_stop_freq &&
       early_stop_margin_ == config.pred_early_stop_margin &&
+      start_iter_ == start_iter &&
       iter_ == iter &&
       num_total_model_ == boosting->NumberOfTotalModel();
   }

  private:
   ...
+  int start_iter_;
   int iter_;
   int num_total_model_;
 };

Caller update in SetSingleRowPredictorInner (the only caller of IsPredictorEqual):

-          !single_row_predictor_[predict_type]->IsPredictorEqual(config, num_iteration, boosting_.get())) {
+          !single_row_predictor_[predict_type]->IsPredictorEqual(config, start_iteration, num_iteration, boosting_.get())) {

Test

Adds SingleRow.StartIterationChangesPrediction to tests/cpp_tests/test_single_row.cpp. The test:

  1. Trains 30 iterations on binary.train.
  2. Calls LGBM_BoosterPredictForMatSingleRow twice on the same booster with num_iteration=10, start_iteration=0 then start_iteration=20 (disjoint iteration windows).
  3. Asserts the two predictions differ.
  4. Cross-checks both against LGBM_BoosterPredictForMat (which does not use the buggy cache) to confirm the single-row path agrees with the non-single-row path for both iteration windows.

TDD verification

I stashed the c_api.cpp fix, rebuilt testlightgbm, and re-ran just the new test. It failed exactly as the issue describes:

[ RUN      ] SingleRow.StartIterationChangesPrediction
test_single_row.cpp:282: Failure
Expected: (pred_early[0]) != (pred_late[0]),
   actual: 0.65339054153163234 vs 0.65339054153163234
LGBM_BoosterPredictForMatSingleRow returned identical predictions for
disjoint iteration windows [0,10) and [20,30); start_iteration is being
ignored by the predictor cache key.

test_single_row.cpp:323: Failure
Expected equality of these values:
  pred_late[0]   Which is: 0.65339054153163234
  ref_late[0]    Which is: 0.56966613224435347
LGBM_BoosterPredictForMatSingleRow (start=20) disagrees with LGBM_BoosterPredictForMat
[  FAILED  ] SingleRow.StartIterationChangesPrediction

I then reapplied the fix, rebuilt, and ran the full SingleRow.* suite:

[ RUN      ] SingleRow.Normal
[       OK ] SingleRow.Normal (708 ms)
[ RUN      ] SingleRow.Contrib
[       OK ] SingleRow.Contrib (954 ms)
[ RUN      ] SingleRow.StartIterationChangesPrediction
[       OK ] SingleRow.StartIterationChangesPrediction (416 ms)
[  PASSED  ] 3 tests.

Build: cmake -S . -B build-test -DBUILD_CPP_TEST=ON -DUSE_OPENMP=ON -G "MinGW Makefiles" && cmake --build build-test --target testlightgbm on Windows + MinGW gcc 13.2.0.

Impact

Any consumer that re-uses a Booster handle and calls any of the single-row predict entry points with varying start_iteration hit this bug. This affects online-serving / low-latency inference patterns where a single booster is loaded once and reused across many per-request prediction calls with different slicing. Every language binding that calls through the C API (including SWIG-based ones) is affected.

No API or ABI surface change — IsPredictorEqual is a private class method; its only caller is updated in the same patch.

SingleRowPredictorInner stored only num_iteration, not start_iteration,
so consecutive calls to LGBM_BoosterPredictForMatSingleRow (and the
sibling Fast/CSR/CSC single-row entry points that share the same cached
predictor) with the same num_iteration but different start_iteration
returned the stale predictions from the first call. The predictor
cache in SetSingleRowPredictorInner never re-initialized because
IsPredictorEqual did not see start_iteration as part of the key.

Add start_iter_ to SingleRowPredictorInner, store it in the constructor,
and require it to match in IsPredictorEqual. The call site in
SetSingleRowPredictorInner already has start_iteration in scope, so the
fix is a two-line change in c_api.cpp.

Adds SingleRow.StartIterationChangesPrediction to tests/cpp_tests/
test_single_row.cpp. The test trains 30 iterations and calls
LGBM_BoosterPredictForMatSingleRow twice on the same booster handle
with num_iteration=10 and start_iteration=0 then start_iteration=20
(disjoint windows). Without the fix the second call hits the stale
predictor cache and returns the score from the [0, 10) window. With
the fix the second call rebuilds the predictor and returns the
[20, 30) window score, which also matches the non-single-row
LGBM_BoosterPredictForMat ground truth.

Verified by stashing the fix, rebuilding testlightgbm, and confirming
the new test fails with "0.65339 vs 0.65339" (stale cache) and
"0.65339 vs 0.56967" (disagrees with LGBM_BoosterPredictForMat), then
reapplying the fix and seeing all three SingleRow tests pass.

Fixes lightgbm-org#7220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] LGBM_BoosterPredictForMatSingleRow ignores start_iteration due to incomplete predictor cache key

1 participant