Skip to content

Conversation

@robn
Copy link
Member

@robn robn commented Oct 21, 2025

[Sponsors: Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

I was looking at #17759 and noticed its tests had all passed but with unusually short run times. Looking at the output, it turns out there was a syntax error in the runfile, but the return code passed back to zfs-tests.sh was a "success" code, and so the whole run was deemed a success.

Description

zfs-tests.sh executes test-runner.py to do the actual test work. Any exit code < 4 is interpreted as success, with the actual value describing the outcome of the tests inside.

If a Python program crashes in some way (eg an uncaught exception), the process exit code is 1.

Taken together, this means that test-runner.py can crash during setup, but return a "success" error code to zfs-tests.sh, which will report and exit 0. This in turn causes the CI runner to believe the test run completed successfully.

This commit addresses this by making zfs-tests.sh interpret an exit code of 255 as a failure in the runner itself. Then, in test-runner.py, the fail() function defaults to a 255 return, and the main function gets wrapped in a generic exception handler, which prints it and calls fail().

All together, this should mean that any unexpected failure in the test runner itself will be propagated out of zfs-tests.sh for CI or any other calling program to deal with.

How Has This Been Tested?

Manual testing, forcing success and failure within tests, and introducing errors into the runfile. Seems right.

To check the CI itself, I've pushed a second commit with a runfile error. If it does the right thing, I'll remove it and take this out of draft.
Update: this worked as expected, see job output.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@robn robn marked this pull request as draft October 21, 2025 02:10
@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label Oct 21, 2025
@robn robn force-pushed the zts-runner-crash-handler branch from 7ec7e38 to 635bf82 Compare October 21, 2025 02:28
@robn robn marked this pull request as ready for review October 21, 2025 02:29
@github-actions github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Oct 21, 2025
@robn robn force-pushed the zts-runner-crash-handler branch from 635bf82 to 148a592 Compare October 21, 2025 02:30
zfs-tests.sh executes test-runner.py to do the actual test work. Any
exit code < 4 is interpreted as success, with the actual value
describing the outcome of the tests inside.

If a Python program crashes in some way (eg an uncaught exception), the
process exit code is 1.

Taken together, this means that test-runner.py can crash during setup,
but return a "success" error code to zfs-tests.sh, which will report and
exit 0. This in turn causes the CI runner to believe the test run
completed successfully.

This commit addresses this by making zfs-tests.sh interpret an exit code
of 255 as a failure in the runner itself. Then, in test-runner.py, the
"fail()" function defaults to a 255 return, and the main function gets
wrapped in a generic exception handler, which prints it and calls
fail().

All together, this should mean that any unexpected failure in the test
runner itself will be propagated out of zfs-tests.sh for CI or any other
calling program to deal with.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
@robn robn force-pushed the zts-runner-crash-handler branch from 148a592 to ca4e67c Compare October 21, 2025 02:51
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for running this down.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 21, 2025
@behlendorf behlendorf merged commit 72f4145 into openzfs:master Oct 21, 2025
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants