Skip to content

Fix flaky test failure because 0 threshold#138

Merged
skyw merged 1 commit intomainfrom
skyw/relax_test_threshold
Mar 19, 2026
Merged

Fix flaky test failure because 0 threshold#138
skyw merged 1 commit intomainfrom
skyw/relax_test_threshold

Conversation

@skyw
Copy link
Contributor

@skyw skyw commented Mar 19, 2026

Prevent failures like https://github.com/NVIDIA-NeMo/Emerging-Optimizers/runs/67746292513 because of values slightly off atol. Don't want to just keep relaxing it, which may mask real errors eventually.

Signed-off-by: Hao Wu <skyw@nvidia.com>
@skyw skyw requested a review from mkhona-nvidia March 19, 2026 15:43
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR fixes a flaky test in test_eigh_with_fallback_reconstruction_close_to_original by pre-processing the random test matrix to replace near-zero values (< 1e-8) with 0.25 before performing eigendecomposition and reconstruction checks.

The root cause of the flakiness is that torch.testing.assert_close uses atol + rtol * |expected| as its tolerance. When a matrix element is near zero, rtol * |expected| ≈ 0, so only atol provides slack. For float32 reconstruction of a 31×31 matrix, floating-point accumulation errors can reach O(N · ε_machine · ‖x‖) ≈ O(31 × 1.2e-7 × 31) ≈ 1.15e-4, which is just above the atol = 1e-4 threshold — causing intermittent failures.

Key aspects of the fix:

  • Only elements with absolute value < 1e-8 are replaced with 0.25 (exact float representation as a power of 2).
  • Since x = randn @ randn.T is exactly symmetric in floating-point, x[i,j] == x[j,i] always, so the masked assignment preserves symmetry.
  • Diagonal elements of randn @ randn.T are sums of squares (always positive and O(N)), so the mask will not affect them.
  • A minor wording issue: the comment says "Remove 0" but the code replaces, not removes, near-zero values.

Confidence Score: 5/5

  • This PR is safe to merge — it is a targeted, low-risk test fix with no changes to production code.
  • The change is confined to a single test file and only adds a two-line pre-processing step. The approach is mathematically sound: the fix preserves matrix symmetry, is unlikely to break PSD properties for the tested matrix sizes, and directly addresses the root cause of the flakiness (absolute tolerance dominating for near-zero values). The only finding is a minor comment wording issue.
  • No files require special attention.

Important Files Changed

Filename Overview
tests/test_eig_utils.py Adds a pre-processing step to replace near-zero matrix elements (< 1e-8) with 0.25 before testing eigendecomposition reconstruction, preventing flaky failures caused by absolute-tolerance checks on near-zero values. The fix preserves symmetry (since x = randn @ randn.T is exactly symmetric, both x[i,j] and x[j,i] are near-zero together and both get set to 0.25) and the PSD property is unlikely to be affected given diagonal elements are O(N) while modified off-diagonal values are 0.25.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Generate random matrix\nx = randn(shape)"] --> B["Make symmetric PSD\nx = x @ x.T"]
    B --> C["NEW: Replace near-zero values\nx[x.abs() < 1e-8] = 0.25"]
    C --> D["Call eigh_with_fallback(x, force_double)"]
    D --> E["Internally: stabilized_x = x + eps * I\nthen torch.linalg.eigh(stabilized_x)"]
    E --> F["Returns L eigenvalues, Q eigenvectors\n(descending order)"]
    F --> G["Reconstruct matrix\nreconstructed = Q @ diag(L) @ Q^T"]
    G --> H{"assert_close\nreconstructed ≈ x?"}
    H -- "atol=1e-4 / 1e-6" --> I["PASS ✅"]
    H -- "| a-b | > atol + rtol*|b|" --> J["FAIL ❌ (was flaky\nwhen x[i,j] ≈ 0)"]

    style C fill:#d4edda,stroke:#28a745
    style J fill:#f8d7da,stroke:#dc3545
    style I fill:#d4edda,stroke:#28a745
Loading

Last reviewed commit: "fix test"

@skyw
Copy link
Contributor Author

skyw commented Mar 19, 2026

/ok to test d50955c

@skyw skyw enabled auto-merge (squash) March 19, 2026 15:49
@skyw skyw merged commit fd7fd8b into main Mar 19, 2026
16 checks passed
@skyw skyw deleted the skyw/relax_test_threshold branch March 19, 2026 16:07
@github-actions
Copy link

Test Results

   48 files  ±0     98 suites  ±0   1m 16s ⏱️ -1s
1 012 tests ±0  1 012 ✅ +1  0 💤 ±0  0 ❌  - 1 
2 255 runs  ±0  2 255 ✅ +1  0 💤 ±0  0 ❌  - 1 

Results for commit d50955c. ± Comparison against base commit 250f140.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants