Skip to content

fix(test): regenerate kpm_regression baseline and close CI gap (#155)#158

Merged
kalwalt merged 3 commits into
feat/freak-visual-databasefrom
fix/kpm-regression-baseline-and-ci-gap
May 23, 2026
Merged

fix(test): regenerate kpm_regression baseline and close CI gap (#155)#158
kalwalt merged 3 commits into
feat/freak-visual-databasefrom
fix/kpm-regression-baseline-and-ci-gap

Conversation

@kalwalt
Copy link
Copy Markdown
Member

@kalwalt kalwalt commented May 22, 2026

Summary

  • Regenerate the stale EXPECTED_FULL_POSE / EXPECTED_FULL_ERROR baseline in crates/core/tests/kpm_regression.rs::test_full_pipeline_pose against the current C++-backed pipeline output on pinball-demo.jpg.
  • Add a new Ubuntu-only step to the kpm-build CI job that runs the three ffi-backend integration tests (kpm_regression, nft_pipeline, ar2_pinball_io) so a stale baseline cannot silently slip through CI again.
  • Add design doc docs/design/m9-kpm-regression-baseline-fix.md capturing Understanding Summary, Decision Log (D1–D6 + Q1/Q2), Assumptions, Risks, and the regeneration recipe.

Why

test_full_pipeline_pose was silently failing on dev: pose[0][2] diverged from the baseline by 6.134e-2 (tolerance 1.0e-2). Reproduced against clean post-#153 base with M9-2 work stashed — the failure is pre-existing and accumulated drift across the M9 series.

CI was the root cause: every M9 PR merged green because no job ran the integration tests with --features ffi-backend. kpm-build only runs --lib --features dual-mode, and build-and-test runs the workspace without ffi-backend.

Regeneration recipe

Documented in the EXPECTED_FULL_POSE doc comment in kpm_regression.rs. Capture was done via a temporary arlog_e! block (per CLAUDE.md §2 logging convention), then removed.

Test plan

  • cargo test -p webarkitlib-rs --test kpm_regression --features ffi-backend — green (5/5)
  • cargo test -p webarkitlib-rs --test nft_pipeline --test ar2_pinball_io --features ffi-backend — green (no drift to fix in these)
  • cargo fmt --all -- --check — clean
  • cargo clippy --workspace -- -D warnings — clean (mirrors CI)
  • cargo test --all-features — 463 passed, 8 ignored
  • New CI step (Run ffi-backend integration tests) passes on Ubuntu in this PR

Closes #155.

🤖 Generated with Claude Code

The `test_full_pipeline_pose` test has been silently failing on `dev`
because no CI job ran the integration tests under `tests/` with
`--features ffi-backend`. The `kpm-build` job only runs `--lib` tests,
and `build-and-test` runs the workspace without `ffi-backend`, so the
C++-backed full-pipeline test was never executed in CI.

This let the `EXPECTED_FULL_POSE` / `EXPECTED_FULL_ERROR` baseline
constants in `crates/core/tests/kpm_regression.rs` drift out of sync
with the actual pipeline output across the M9 series.

Changes:

- Regenerate `EXPECTED_FULL_POSE` and `EXPECTED_FULL_ERROR` against
  the current C++-backed pipeline on `pinball-demo.jpg`. Capture was
  done via a temporary `arlog_e!` block inside the test (per
  CLAUDE.md §2 logging convention), then removed.
- Document the regeneration procedure in the `EXPECTED_FULL_POSE`
  doc comment so future maintainers have a one-glance recipe.
- Add a new Ubuntu-only step to the `kpm-build` job that runs the
  three `ffi-backend` integration tests (`kpm_regression`,
  `nft_pipeline`, `ar2_pinball_io`). This closes the gate so a stale
  baseline can never silently slip through CI again.
- Add design doc `docs/design/m9-kpm-regression-baseline-fix.md`
  capturing Understanding Summary, Decision Log, Assumptions, Risks,
  and Verification workflow (matches the M9 series doc pattern).

Closes #155.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kalwalt kalwalt self-assigned this May 22, 2026
@kalwalt kalwalt moved this from Backlog to In progress in Plan to port KPM to rust May 22, 2026
@kalwalt kalwalt added bug Something isn't working enhancement New feature or request rust code tests kpm regression ci/cd labels May 22, 2026
@kalwalt kalwalt changed the base branch from dev to feat/freak-visual-database May 22, 2026 19:18
…inux

CI on PR #158 surfaced R2 from the design doc: the baseline I
regenerated on Windows fails on the Ubuntu runner by ~6e-2 in
pose[0][2] — far above the 1e-2 tolerance. The original Linux baseline
was actually correct all along; the local Windows failure that
motivated this PR was cross-platform rounding variance accumulating
through the C++ FREAK + RANSAC + ICP chain, not staleness.

Changes:
- Restore the original EXPECTED_FULL_POSE / EXPECTED_FULL_ERROR
  values (Linux baseline).
- Gate test_full_pipeline_pose to target_os = "linux" so
  Windows/macOS local runs of `cargo test` skip rather than misreport
  the cross-platform variance.
- Update EXPECTED_FULL_POSE doc with explicit platform-sensitivity
  note and Linux-only regeneration procedure.
- Update design doc with R2 materialization and resolution.

The CI gate is unchanged (Ubuntu-only step in kpm-build job) and
still catches genuine drift on the platform that owns the baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kalwalt kalwalt moved this from In progress to In review in Plan to port KPM to rust May 23, 2026
On reflection the regen capture is a one-shot informational dump,
which per CLAUDE.md §2 maps to arlog_i!, not arlog_e! (which is for
misconfiguration / wiring errors). Update the recipe in
EXPECTED_FULL_POSE doc comment and design doc D2 accordingly, and
document RUST_LOG=info in the run command.

Refs #155.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kalwalt kalwalt merged commit 468858a into feat/freak-visual-database May 23, 2026
16 checks passed
kalwalt added a commit that referenced this pull request May 23, 2026
…inux

CI on PR #158 surfaced R2 from the design doc: the baseline I
regenerated on Windows fails on the Ubuntu runner by ~6e-2 in
pose[0][2] — far above the 1e-2 tolerance. The original Linux baseline
was actually correct all along; the local Windows failure that
motivated this PR was cross-platform rounding variance accumulating
through the C++ FREAK + RANSAC + ICP chain, not staleness.

Changes:
- Restore the original EXPECTED_FULL_POSE / EXPECTED_FULL_ERROR
  values (Linux baseline).
- Gate test_full_pipeline_pose to target_os = "linux" so
  Windows/macOS local runs of `cargo test` skip rather than misreport
  the cross-platform variance.
- Update EXPECTED_FULL_POSE doc with explicit platform-sensitivity
  note and Linux-only regeneration procedure.
- Update design doc with R2 materialization and resolution.

The CI gate is unchanged (Ubuntu-only step in kpm-build job) and
still catches genuine drift on the platform that owns the baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-project-automation github-project-automation Bot moved this from In review to Done in Plan to port KPM to rust May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/cd enhancement New feature or request kpm regression rust code tests

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

fix(ci): test_full_pipeline_pose fails on feat/freak-visual-database with --features ffi-backend (CI gap)

1 participant