Improve multi-tool agent tests for robustness #44354

paulbatum · 2025-12-09T23:53:07Z

Description

Tests now properly verify that all configured tools are actually used.

Changes:

Use non-trivial calculations that require actual tool execution
Add assertions verifying function calls are made
Add validation of computed values in function arguments
Replace Unicode checkmark with [PASS] for Windows compatibility
Remove test_four_tools_combination (was not actually testing tool usage)

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

Copilot

Pull request overview

This PR improves the robustness of multi-tool agent tests by ensuring that all configured tools are actually used and verified during test execution. Previously, some tests would only verify that an agent was created with multiple tools but wouldn't confirm the tools were actually invoked. The updated tests now use non-trivial calculations that require actual tool execution, add explicit assertions to verify function calls are made, and validate computed values in function arguments.

Key changes:

Enhanced test assertions to verify actual tool usage rather than just agent creation
Replaced trivial calculations with complex ones that require code execution (e.g., 17^4, fibonacci(15), averaging 15 sensor readings)
Added function call verification and argument validation logic
Replaced Unicode checkmarks with [PASS] for cross-platform compatibility
Removed test_four_tools_combination which only tested agent creation, not tool usage

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
test_multitool_with_conversations.py	Replaced Unicode checkmarks with [PASS] for Windows compatibility
test_agent_file_search_code_interpreter_function.py	CRITICAL ISSUE: Added function call verification with json parsing but missing required imports; removed test_four_tools_combination; enhanced test_complete_analysis_workflow to validate computed statistics
test_agent_file_search_and_function.py	Enhanced test_python_code_file_search to verify both File Search and Function Tool usage; replaced Unicode checkmarks with [PASS]
test_agent_file_search_and_code_interpreter.py	Enhanced both tests with non-trivial data requiring actual computation; added validation of calculated results with brittle string matching
test_agent_code_interpreter_and_function.py	CRITICAL ISSUE: Added function call verification and argument validation but missing required imports (json, ResponseInputParam, FunctionCallOutput)

...i/azure-ai-projects/tests/agents/tools/multitool/test_agent_code_interpreter_and_function.py

...zure-ai-projects/tests/agents/tools/multitool/test_agent_file_search_and_code_interpreter.py

…vior

Copilot AI review requested due to automatic review settings December 9, 2025 23:53

paulbatum requested review from dargilco, glharper, howieleung, kingernupur, nick863, trangevi and trrwilson as code owners December 9, 2025 23:53

github-actions bot added the AI Projects label Dec 9, 2025

Copilot started reviewing on behalf of paulbatum December 9, 2025 23:53 View session

Copilot AI reviewed Dec 10, 2025

View reviewed changes

dargilco approved these changes Dec 10, 2025

View reviewed changes

paulbatum enabled auto-merge (squash) December 10, 2025 19:25

paulbatum force-pushed the developer/pbatum/multi-tool-fixes branch 2 times, most recently from cbe7070 to 865b294 Compare December 10, 2025 22:15

paulbatum added 2 commits December 10, 2025 14:22

Improve multi-tool agent tests for robustness and model-agnostic beha…

d05f7f1

…vior

update recordings

f69f956

paulbatum force-pushed the developer/pbatum/multi-tool-fixes branch from 865b294 to f69f956 Compare December 10, 2025 22:28

dargilco approved these changes Dec 10, 2025

View reviewed changes

paulbatum merged commit 18785d2 into main Dec 10, 2025
20 checks passed

paulbatum deleted the developer/pbatum/multi-tool-fixes branch December 10, 2025 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve multi-tool agent tests for robustness #44354

Improve multi-tool agent tests for robustness #44354

Uh oh!

paulbatum commented Dec 9, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve multi-tool agent tests for robustness #44354

Improve multi-tool agent tests for robustness #44354

Uh oh!

Conversation

paulbatum commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paulbatum commented Dec 9, 2025 •

edited

Loading