fix: stabilize recording hash normalization to reduce flaky integration tests by leseb · Pull Request #5252 · llamastack/llama-stack

leseb · 2026-03-23T15:04:38Z

Summary

Enhance _normalize_body_for_hash() in api_recorder.py to produce stable hashes across semantically equivalent but structurally different request bodies
Delete 22 stale recordings for test_mcp_invocation that accumulated from hash drift — they will regenerate on the next CI run

Problem

test_mcp_invocation fails ~17% of the time in CI (docker, ollama, base) because the recording/replay hash changes when code evolves. The same test had 19-22 different recordings representing different request body variants that are semantically identical.

Normalizations added

Field	Before	After
`max_tokens`	`None` vs `0` produce different hashes	Both dropped from hash
`tool_choice`	`None` vs `"auto"` produce different hashes	Both dropped from hash
Message content	`[{"type": "text", "text": "X"}]` vs `"X"`	Collapsed to `"X"`
Tool call IDs	`call_c1tlwvxc` vs `call_oezek4up`	Replaced with stable placeholder

Test plan

10 new unit tests covering each normalization
All existing recording tests pass
CI

Signed-off-by: Sebastien Han shan@redhat.com

leseb · 2026-03-23T15:06:42Z

this is an attempt to reduce flakiness of Integration Tests (docker, ollama, 3.12, client=latest, base)

The test_mcp_invocation test was flaky (~17% failure rate) because the recording/replay hash changed across code versions due to semantically equivalent but structurally different request bodies. Add normalizations to _normalize_body_for_hash() for: - max_tokens: treat None and 0 as equivalent (both dropped from hash) - tool_choice: treat None and "auto" as equivalent (both dropped) - Message content: collapse [{type: text, text: X}] to plain string "X" - Tool call IDs: replace random call_xxx IDs with a stable placeholder Delete stale recordings for test_mcp_invocation so they will be re-generated on the next CI run with record-if-missing mode. Add unit tests covering each normalization rule and a combined test verifying that two structurally different but semantically equivalent request bodies produce the same hash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Re-hash all 2625 integration test recordings using the new normalization in _normalize_body_for_hash(). This collapses 927 duplicate recordings that differed only in semantically equivalent fields (max_tokens null vs 0, tool_choice null vs auto, content format, tool call IDs). 2625 recordings → 1698 after deduplication. Signed-off-by: Sébastien Han <seb@redhat.com>

github-actions · 2026-03-23T17:21:57Z

Recording workflow finished with status: failure

Providers: gpt, azure, watsonx

Recording attempt finished. Check the workflow run for details.

View workflow run

Fork PR: Recordings will be committed if you have "Allow edits from maintainers" enabled.

iamemilio · 2026-03-24T15:01:01Z

any correlation between this: #5233 ?

leseb · 2026-03-24T15:17:48Z

any correlation between this: #5233 ?

yes

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 23, 2026

leseb force-pushed the fix-recording-hash-normalization branch from 44a465a to 4c1de12 Compare March 23, 2026 15:07

leseb closed this Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilize recording hash normalization to reduce flaky integration tests#5252

fix: stabilize recording hash normalization to reduce flaky integration tests#5252
leseb wants to merge 2 commits intollamastack:mainfrom
leseb:fix-recording-hash-normalization

leseb commented Mar 23, 2026

Uh oh!

leseb commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

iamemilio commented Mar 24, 2026

Uh oh!

leseb commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leseb commented Mar 23, 2026

Summary

Problem

Normalizations added

Test plan

Uh oh!

leseb commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

iamemilio commented Mar 24, 2026

Uh oh!

leseb commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants