Skip to content

Conversation

@paulbatum
Copy link
Member

@paulbatum paulbatum commented Dec 9, 2025

Description

Tests now properly verify that all configured tools are actually used.

Changes:

  • Use non-trivial calculations that require actual tool execution
  • Add assertions verifying function calls are made
  • Add validation of computed values in function arguments
  • Replace Unicode checkmark with [PASS] for Windows compatibility
  • Remove test_four_tools_combination (was not actually testing tool usage)

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of multi-tool agent tests by ensuring that all configured tools are actually used and verified during test execution. Previously, some tests would only verify that an agent was created with multiple tools but wouldn't confirm the tools were actually invoked. The updated tests now use non-trivial calculations that require actual tool execution, add explicit assertions to verify function calls are made, and validate computed values in function arguments.

Key changes:

  • Enhanced test assertions to verify actual tool usage rather than just agent creation
  • Replaced trivial calculations with complex ones that require code execution (e.g., 17^4, fibonacci(15), averaging 15 sensor readings)
  • Added function call verification and argument validation logic
  • Replaced Unicode checkmarks with [PASS] for cross-platform compatibility
  • Removed test_four_tools_combination which only tested agent creation, not tool usage

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test_multitool_with_conversations.py Replaced Unicode checkmarks with [PASS] for Windows compatibility
test_agent_file_search_code_interpreter_function.py CRITICAL ISSUE: Added function call verification with json parsing but missing required imports; removed test_four_tools_combination; enhanced test_complete_analysis_workflow to validate computed statistics
test_agent_file_search_and_function.py Enhanced test_python_code_file_search to verify both File Search and Function Tool usage; replaced Unicode checkmarks with [PASS]
test_agent_file_search_and_code_interpreter.py Enhanced both tests with non-trivial data requiring actual computation; added validation of calculated results with brittle string matching
test_agent_code_interpreter_and_function.py CRITICAL ISSUE: Added function call verification and argument validation but missing required imports (json, ResponseInputParam, FunctionCallOutput)

@paulbatum paulbatum enabled auto-merge (squash) December 10, 2025 19:25
@paulbatum paulbatum force-pushed the developer/pbatum/multi-tool-fixes branch 2 times, most recently from cbe7070 to 865b294 Compare December 10, 2025 22:15
@paulbatum paulbatum force-pushed the developer/pbatum/multi-tool-fixes branch from 865b294 to f69f956 Compare December 10, 2025 22:28
@paulbatum paulbatum merged commit 18785d2 into main Dec 10, 2025
20 checks passed
@paulbatum paulbatum deleted the developer/pbatum/multi-tool-fixes branch December 10, 2025 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants