ZTS: fail test run if test runner crashes unexpectedly #17858
                
     Merged
            
            
          
      
        
          +23
        
        
          −14
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
[Sponsors: Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
I was looking at #17759 and noticed its tests had all passed but with unusually short run times. Looking at the output, it turns out there was a syntax error in the runfile, but the return code passed back to
zfs-tests.shwas a "success" code, and so the whole run was deemed a success.Description
zfs-tests.shexecutestest-runner.pyto do the actual test work. Any exit code < 4 is interpreted as success, with the actual value describing the outcome of the tests inside.If a Python program crashes in some way (eg an uncaught exception), the process exit code is 1.
Taken together, this means that
test-runner.pycan crash during setup, but return a "success" error code tozfs-tests.sh, which will report and exit 0. This in turn causes the CI runner to believe the test run completed successfully.This commit addresses this by making
zfs-tests.shinterpret an exit code of 255 as a failure in the runner itself. Then, intest-runner.py, thefail()function defaults to a 255 return, and the main function gets wrapped in a generic exception handler, which prints it and callsfail().All together, this should mean that any unexpected failure in the test runner itself will be propagated out of
zfs-tests.shfor CI or any other calling program to deal with.How Has This Been Tested?
Manual testing, forcing success and failure within tests, and introducing errors into the runfile. Seems right.
To check the CI itself, I've pushed a second commit with a runfile error. If it does the right thing, I'll remove it and take this out of draft.
Update: this worked as expected, see job output.
Types of changes
Checklist:
Signed-off-by.