Skip to content

Conversation

@howieleung
Copy link
Member

@howieleung howieleung commented Dec 9, 2025

The existing code has already collected print call content and validate if the content after ==> Result meets certain criteria.
Now change replace this validation by submitting all print contents to response.create and validate by AI.
This response.create call will be recorded but the input which is the print contents are sanitized. So if you modify the print statement in samples, you don't need to re-record and still able to replay the record with assertion passed.

Also, if responses said test fail, it is hard to check what content was in the print call. So I write it to temp file.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the sample testing infrastructure to use an AI agent for validating sample outputs. The main change replaces pattern-based output validation with an LLM-powered validation approach using Azure OpenAI.

Key Changes:

  • Converted SampleExecutor from a simple helper class to a decorator pattern with context manager support
  • Introduced agent-based validation to check if sample outputs indicate success or failure
  • Moved environment variable mapping to a separate function and refactored execution flow

@howieleung howieleung force-pushed the howie/validate-sample-by-agent branch 2 times, most recently from 0309437 to 3a9f439 Compare December 10, 2025 20:23
@howieleung howieleung force-pushed the howie/validate-sample-by-agent branch from 510d630 to b55c649 Compare December 10, 2025 23:55
@dargilco
Copy link
Member

Big change, don't forget to run 'black' tool. Thanks!

@howieleung howieleung force-pushed the howie/validate-sample-by-agent branch from b55c649 to 8d2ff2f Compare December 11, 2025 01:53
@howieleung howieleung force-pushed the howie/validate-sample-by-agent branch from 8d2ff2f to 99386f8 Compare December 11, 2025 02:36
@howieleung howieleung enabled auto-merge (squash) December 11, 2025 04:57
@howieleung howieleung force-pushed the howie/validate-sample-by-agent branch from 94f6871 to 65570ab Compare December 11, 2025 05:15
@howieleung howieleung merged commit 7772fd6 into main Dec 11, 2025
20 checks passed
@howieleung howieleung deleted the howie/validate-sample-by-agent branch December 11, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants