ZTS: fail test run if test runner crashes unexpectedly #17858

robn · 2025-10-21T02:09:34Z

[Sponsors: Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

I was looking at #17759 and noticed its tests had all passed but with unusually short run times. Looking at the output, it turns out there was a syntax error in the runfile, but the return code passed back to zfs-tests.sh was a "success" code, and so the whole run was deemed a success.

Description

zfs-tests.sh executes test-runner.py to do the actual test work. Any exit code < 4 is interpreted as success, with the actual value describing the outcome of the tests inside.

If a Python program crashes in some way (eg an uncaught exception), the process exit code is 1.

Taken together, this means that test-runner.py can crash during setup, but return a "success" error code to zfs-tests.sh, which will report and exit 0. This in turn causes the CI runner to believe the test run completed successfully.

This commit addresses this by making zfs-tests.sh interpret an exit code of 255 as a failure in the runner itself. Then, in test-runner.py, the fail() function defaults to a 255 return, and the main function gets wrapped in a generic exception handler, which prints it and calls fail().

All together, this should mean that any unexpected failure in the test runner itself will be propagated out of zfs-tests.sh for CI or any other calling program to deal with.

How Has This Been Tested?

Manual testing, forcing success and failure within tests, and introducing errors into the runfile. Seems right.

To check the CI itself, I've pushed a second commit with a runfile error. If it does the right thing, I'll remove it and take this out of draft.
Update: this worked as expected, see job output.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

zfs-tests.sh executes test-runner.py to do the actual test work. Any exit code < 4 is interpreted as success, with the actual value describing the outcome of the tests inside. If a Python program crashes in some way (eg an uncaught exception), the process exit code is 1. Taken together, this means that test-runner.py can crash during setup, but return a "success" error code to zfs-tests.sh, which will report and exit 0. This in turn causes the CI runner to believe the test run completed successfully. This commit addresses this by making zfs-tests.sh interpret an exit code of 255 as a failure in the runner itself. Then, in test-runner.py, the "fail()" function defaults to a 255 return, and the main function gets wrapped in a generic exception handler, which prints it and calls fail(). All together, this should mean that any unexpected failure in the test runner itself will be propagated out of zfs-tests.sh for CI or any other calling program to deal with. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>

behlendorf

Thanks for running this down.

robn marked this pull request as draft October 21, 2025 02:10

github-actions bot added the Status: Work in Progress Not yet ready for general review label Oct 21, 2025

robn force-pushed the zts-runner-crash-handler branch from 7ec7e38 to 635bf82 Compare October 21, 2025 02:28

robn marked this pull request as ready for review October 21, 2025 02:29

github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Oct 21, 2025

robn force-pushed the zts-runner-crash-handler branch from 635bf82 to 148a592 Compare October 21, 2025 02:30

robn force-pushed the zts-runner-crash-handler branch from 148a592 to ca4e67c Compare October 21, 2025 02:51

tonyhutter approved these changes Oct 21, 2025

View reviewed changes

behlendorf approved these changes Oct 21, 2025

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 21, 2025

behlendorf merged commit 72f4145 into openzfs:master Oct 21, 2025
22 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZTS: fail test run if test runner crashes unexpectedly #17858

ZTS: fail test run if test runner crashes unexpectedly #17858

robn commented Oct 21, 2025 •

edited

Loading

Uh oh!

behlendorf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZTS: fail test run if test runner crashes unexpectedly #17858

ZTS: fail test run if test runner crashes unexpectedly #17858

Conversation

robn commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

behlendorf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robn commented Oct 21, 2025 •

edited

Loading