Add `BackendCannotProceed` to improve integration #4092

Zac-HD · 2024-08-25T05:15:52Z

Sometimes, an alternative backend will just be done, and this new exception (and handlers) provides an interface to communicate that. There are several reasons this could happen, including:

not_realizable: e.g. hitting a timeout while attempting to .realize() some value, as in Rare PathTimeout errors in provider.realize(...) pschanely/hypothesis-crosshair#21
- if we hit more than ten of these, we switch to the Hypothesis backend to preserve good performance.
exhausted: running out of useful work to do, e.g. if the test function immediately invokes compiled code then Crosshair doesn't have much to do, and can hand back to the Hypothesis backend
verified: for the cases where we've estabilished that the test will never fail, similar to Hypothesis' own ExitReason.finished if the datatree is exhausted (e.g. when st.booleans() has generated both True and False).
- Since alternative backends are generally pretty new and might miss bugs, we treat this as for exhausted and continue testing on the Hypothesis backend for now.

TODO: lots of tests

that we handle it reasonably if a provider raises on the first interaction, second interaction, etc.
that it works with observability
that incorrect verification reports are reported
that if too many not_realized reports are made we switch back; also maybe improve the heuristic?

cc @pschanely; I'm probably not going to get back to this for a week or two but if you want to try it out I'd love to hear your thoughts 🙂

pschanely · 2024-09-04T00:51:21Z

@Zac-HD Easy to integrate, and it makes the plugin code less fragile! 🎉 (my changes)

I have a few ignored-path cases that aren't technically not_realizable (e.g. timeout/unknown sat during body execution or even during the draws). Maybe we rename to something a little more generic?
The idea behind your exhausted reason is fascinating - I'm not producing it yet, but it would be possible to detect at least some cases where we simply aren't going to make progress.
Just spot checking test_large_data_will_fail_a_health_check, I am now getting a FlakyReplay (BackendCannotProceed vs overrun) - maybe we can add an explicit check for this exception there?

Zac-HD · 2024-09-05T05:44:15Z

@Zac-HD Easy to integrate, and it makes the plugin code less fragile! 🎉 (my changes)

Nice!

I have a few ignored-path cases that aren't technically not_realizable (e.g. timeout/unknown sat during body execution or even during the draws). Maybe we rename to something a little more generic?

How about discard_test_case?

The idea behind your exhausted reason is fascinating - I'm not producing it yet, but it would be possible to detect at least some cases where we simply aren't going to make progress.

I was thinking we'd try to use verified only when we thought that the result was sound, e.g. if you ever hit an IgnoreAttempt, UnexploredPath, or suspected_proxy_intolerance_exception then you'd raise exhausted instead there.

Ideally this would allow us to support sound verification of e.g. code that just does SMT-friendly integer logic, whereas tests that hit timeouts or whatever else will fall back to Hypothesis' own backend for the remainder of the test budget. At the extremely ambitious end, we might eventually try to focus Hypothesis on the subspaces that Crosshair couldn't handle as per the last few ideas in #3914. That's definitely a far-future thing though, and I'd rather start with using Crosshair to explore specific branches that HypoFuzz can't cover...

Just spot checking test_large_data_will_fail_a_health_check, I am now getting a FlakyReplay (BackendCannotProceed vs overrun) - maybe we can add an explicit check for this exception there?

Hmm, I don't think BackendCannotProceed should ever be raised all the way to the user, but that just suggests that I need to handle it somewhere else 🙂

pschanely · 2024-09-07T14:29:14Z

I have a few ignored-path cases that aren't technically not_realizable (e.g. timeout/unknown sat during body execution or even during the draws). Maybe we rename to something a little more generic?

How about discard_test_case?

That works!

The idea behind your exhausted reason is fascinating - I'm not producing it yet, but it would be possible to detect at least some cases where we simply aren't going to make progress.

I was thinking we'd try to use verified only when we thought that the result was sound, e.g. if you ever hit an IgnoreAttempt, UnexploredPath, or suspected_proxy_intolerance_exception then you'd raise exhausted instead there.

Ideally this would allow us to support sound verification of e.g. code that just does SMT-friendly integer logic, whereas tests that hit timeouts or whatever else will fall back to Hypothesis' own backend for the remainder of the test budget. At the extremely ambitious end, we might eventually try to focus Hypothesis on the subspaces that Crosshair couldn't handle as per the last few ideas in #3914. That's definitely a far-future thing though, and I'd rather start with using Crosshair to explore specific branches that HypoFuzz can't cover...

Over-sharing a little bit, but this is CrossHair's exception taxonomy for early path exits:

* CrossHairInternal - We hit a bug in CrossHair.
* IgnoreAttempt - This path is impossible (we made an assumption that was later invalidated)
* UnexploredPath - This path is viable, but CrossHair could not complete it, because:
  * UnknownSatisfiability - The solver could not make some decision in time.
  * PathTimeout - The path, as a whole, took too long.
  * CrosshairUnsupported - Some part of the code is not supported by CrossHair.

Notably, IgnoreAttempt doesn't mean that we're skipping some portion of the input space (though all the others do). One challenge is that a single UnexploredPath doesn't suggest much about what will happen on future paths. We probably want to track a ratio of completion statuses over a few runs, like (# of successful completions) / (# of UnexploredPaths + successful completions). It might be easiest to do this on the plugin side, and have me to raise a "I'm not making enough progress" exception at the right time.

The other case that's not covered here is when we have plenty of successful completions, but too much realization is happening. This could also potentially be detected in the plugin.

And, yes, I share your vision for expanding information sharing between the backends. There is so much we could do.

Zac-HD · 2024-09-11T08:34:27Z

OK, sounds like you could still finish verified after also seeing IgnoreAttempt paths - happily the choice of finish status is taken by the plugin, so Hypothesis doesn't need to know about this taxonomy 🙂
Similarly, if you're seeing too much realization or overly slow progress for other reasons, you can finish with exhausted and Hypothesis will continue with our standard backend
- I think Hypothesis should also have something here, but likely much more conservative than the if self.__failed_realize_count > 10: in the current draft. Maybe >20% of the last thousand attempts?

Zac-HD · 2024-10-08T05:44:30Z

@pschanely I think this is now complete on the Hypothesis end; let me know if you find any issues or have any suggestions from pschanely/hypothesis-crosshair#23? I'll merge after you confirm it's ready.

hypothesis-python/src/hypothesis/core.py

pschanely · 2024-10-09T02:39:15Z

@Zac-HD this is looking pretty good when I cobble together the right bits of everything. Let's ship it!

Zac-HD mentioned this pull request Aug 25, 2024

Stable support for symbolic execution #3914

Open

19 tasks

Zac-HD force-pushed the backend-cannot-proceed branch 4 times, most recently from bfeb51d to b498f05 Compare October 6, 2024 22:34

Zac-HD force-pushed the backend-cannot-proceed branch from b498f05 to a7cc537 Compare October 7, 2024 09:10

Add BackendCannotProceed exception

b948562

Zac-HD force-pushed the backend-cannot-proceed branch from a7cc537 to b948562 Compare October 7, 2024 09:15

tybug reviewed Oct 8, 2024

View reviewed changes

hypothesis-python/src/hypothesis/core.py Outdated Show resolved Hide resolved

tweak unsound-verified message

6a67838

Zac-HD force-pushed the backend-cannot-proceed branch from fec4867 to 6a67838 Compare October 8, 2024 18:53

Zac-HD merged commit 77335f3 into HypothesisWorks:master Oct 9, 2024
56 checks passed

Zac-HD deleted the backend-cannot-proceed branch October 9, 2024 03:20

Zac-HD mentioned this pull request Oct 9, 2024

[experimental] Run crosshair in CI #4034

Draft

33 tasks

NameNoQuality approved these changes Oct 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `BackendCannotProceed` to improve integration #4092

Add `BackendCannotProceed` to improve integration #4092

Zac-HD commented Aug 25, 2024 •

edited

Loading

pschanely commented Sep 4, 2024

Zac-HD commented Sep 5, 2024

pschanely commented Sep 7, 2024 •

edited

Loading

Zac-HD commented Sep 11, 2024

Zac-HD commented Oct 8, 2024

pschanely commented Oct 9, 2024

Add BackendCannotProceed to improve integration #4092

Add BackendCannotProceed to improve integration #4092

Conversation

Zac-HD commented Aug 25, 2024 • edited Loading

pschanely commented Sep 4, 2024

Zac-HD commented Sep 5, 2024

pschanely commented Sep 7, 2024 • edited Loading

Zac-HD commented Sep 11, 2024

Zac-HD commented Oct 8, 2024

pschanely commented Oct 9, 2024

Add `BackendCannotProceed` to improve integration #4092

Add `BackendCannotProceed` to improve integration #4092

Zac-HD commented Aug 25, 2024 •

edited

Loading

pschanely commented Sep 7, 2024 •

edited

Loading