Skip to content

Add evaluation test scenarios and remove orphaned azure-ai-evaluation-py tests#106

Merged
thegovind merged 1 commit intomicrosoft:mainfrom
imatiach-msft:cleanup-azure-ai-evaluation-tests
Feb 7, 2026
Merged

Add evaluation test scenarios and remove orphaned azure-ai-evaluation-py tests#106
thegovind merged 1 commit intomicrosoft:mainfrom
imatiach-msft:cleanup-azure-ai-evaluation-tests

Conversation

@imatiach-msft
Copy link
Contributor

Summary

Adds comprehensive evaluation test scenarios to azure-ai-projects-py and removes the orphaned azure-ai-evaluation-py test folder.

Changes

Added to tests/scenarios/azure-ai-projects-py/scenarios.yaml

6 new evaluation scenarios (+514 lines):

Scenario Description
evaluation_with_inline_data Complete eval workflow: inline JSONL data, create eval, run, poll, retrieve results
agent_evaluation_with_sample Agent evaluation with {{sample.output_text}} and {{sample.output_items}} mapping
custom_code_evaluator CodeBasedEvaluatorDefinition with project_client.evaluators.create_version()
safety_evaluators builtin.violence, builtin.sexual, builtin.hate_unfairness
openai_graders label_model, string_check, text_similarity grader types
list_builtin_evaluators Evaluator discovery via project_client.evaluators.list_latest_versions()

Deleted

Test Results

All 18 scenarios pass with 100% score:

Scenarios: 18 Passed: 18 Failed: 0 Pass Rate: 100.0% Average Score: 100.0

Notes

The new scenarios ensure the deprecated azure-ai-evaluation SDK patterns are flagged as forbidden:

  • from azure.ai.evaluation import → forbidden
  • ViolenceEvaluator, @evaluator decorator → forbidden
  • Class-based graders like AzureOpenAILabelGrader → forbidden

@imatiach-msft imatiach-msft force-pushed the cleanup-azure-ai-evaluation-tests branch from a522b59 to e814719 Compare February 6, 2026 22:51
@thegovind thegovind merged commit 843898b into microsoft:main Feb 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants