-
Notifications
You must be signed in to change notification settings - Fork 70
Test windows run examples workflow #1309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Enhanced _check_chromium_available to check common Windows installation paths for Chrome and Edge executables. Also added support for Playwright cache detection on Windows using LOCALAPPDATA.
Removed redundant os.name check and always define Windows Chromium and Edge paths. Simplified Playwright cache candidate logic by unconditionally including the Windows path.
Simplifies and generalizes the construction of Windows browser executable paths by iterating over environment variables and browser definitions. This reduces code duplication and improves maintainability.
Cleaned up unnecessary trailing whitespace in the _check_chromium_available function for improved code style.
Remove incorrect skip logic that prevented checking %LOCALAPPDATA% for Microsoft Edge installations. While Edge is typically installed system-wide, it can also be installed per-user in enterprise environments at %LOCALAPPDATA%\Microsoft\Edge\Application\msedge.exe. This fix ensures the browser detection can find Edge in both installation scenarios.
Renamed the 'browsers' variable to 'windows_browsers' for clarity and updated its usage in the Chromium availability check function.
Introduces WindowsBrowserToolExecutor in a new impl_windows.py for improved Chromium detection on Windows. Updates BrowserToolSet to use the Windows-specific executor when running on Windows, and refactors impl.py to remove Windows-specific code paths.
Co-authored-by: Copilot <[email protected]>
Moved Chromium detection functions into BrowserToolExecutor as instance methods and provided a Windows-specific override in WindowsBrowserToolExecutor. This improves platform extensibility and removes reliance on module-level function overrides.
Updated tests to instantiate BrowserToolExecutor and WindowsBrowserToolExecutor for Chromium detection methods, replacing direct function calls. This aligns tests with the refactored implementation and improves test accuracy for platform-specific logic.
Updated test cases in browser cleanup and initialization tests to use patch.object on BrowserToolExecutor for _ensure_chromium_available, instead of patching the function by import path. This improves test robustness and clarity by directly targeting the class method.
Refactored browser path search to short-circuit on first found executable for efficiency and clarified environment variable handling. Updated test to use a more accurate mock for os.environ.get.
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 48.6s | $0.07 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 36.2s | $0.03 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 20.0s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 1m 25s | $0.10 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 24.8s | $0.03 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 55.9s | $0.07 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 1m 19s | $0.09 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 25.6s | $0.02 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 44.6s | $0.04 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 31s | $0.45 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 28.1s | $0.03 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 39.9s | $0.03 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 24.5s | $0.05 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 34.6s | $0.03 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 18.3s | $0.01 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 29.7s | $0.03 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 37.4s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 9m 5s | $0.97 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 3m 22s | $0.40 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 37.6s | $0.06 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 1m 4s | $0.08 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 1m 14s | $0.06 |
| 01_standalone_sdk/30_tom_agent.py | ❌ FAIL Missing EXAMPLE_COST marker in stdout |
42.8s | -- |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 2m 0s | $0.16 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 3m 24s | $0.10 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 2m 1s | $0.22 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 3m 42s | $0.15 |
❌ Some tests failed
Total: 27 | Passed: 26 | Failed: 1 | Total Cost: $3.28
Failed examples:
- examples/01_standalone_sdk/30_tom_agent.py: Missing EXAMPLE_COST marker in stdout
Coverage Report •
|
|||||||||||||||||||||||||
0f6c32a to
54d93bc
Compare
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 44.8s | $0.07 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 30.5s | $0.04 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 19.1s | $0.02 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 1m 12s | $0.08 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 27.6s | $0.03 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 59.9s | $0.07 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 1m 10s | $0.08 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 22.2s | $0.03 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 46.7s | $0.04 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 4m 5s | $0.75 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 33.4s | $0.05 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 30.2s | $0.03 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 24.5s | $0.05 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 26.8s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 18.0s | $0.01 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 29.1s | $0.03 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 36.5s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 6m 8s | $0.60 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 3m 12s | $0.49 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 45.9s | $0.07 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 49.3s | $0.08 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 1m 14s | $0.08 |
| 01_standalone_sdk/30_tom_agent.py | ❌ FAIL Missing EXAMPLE_COST marker in stdout |
51.2s | -- |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 50s | $0.16 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 2m 28s | $0.11 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 3m 5s | $0.16 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 2m 22s | $0.11 |
❌ Some tests failed
Total: 27 | Passed: 26 | Failed: 1 | Total Cost: $3.29
Failed examples:
- examples/01_standalone_sdk/30_tom_agent.py: Missing EXAMPLE_COST marker in stdout
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 47.2s | $0.08 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 29.2s | $0.04 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 18.0s | $0.02 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 1m 43s | $0.12 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 26.8s | $0.03 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 58.3s | $0.07 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 1m 6s | $0.10 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 23.4s | $0.04 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 43.6s | $0.04 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 5m 36s | $0.95 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 28.3s | $0.05 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 25.6s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 21.8s | $0.04 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 29.4s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 17.4s | $0.01 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 26.5s | $0.03 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 1m 18s | $0.02 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 6m 15s | $0.61 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 3m 15s | $0.51 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 34.2s | $0.06 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 53.2s | $0.08 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 1m 5s | $0.07 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 38.5s | $0.04 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 50s | $0.17 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 2m 52s | $0.11 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 2m 58s | $0.17 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 2m 4s | $0.12 |
✅ All tests passed!
Total: 27 | Passed: 27 | Failed: 0 | Total Cost: $3.61
BrowserToolExecutor now checks common Linux and macOS installation paths for Chromium and Chrome if not found in PATH. Corresponding tests were added to verify detection via these standard paths.
Renamed the example script for consistency or to reflect updated numbering in the standalone SDK examples directory.
Adds a print statement to display the accumulated cost from the LLM metrics at the end of the Tom agent consultation example.
Changed references from 30_windows.py to 31_windows.py in the exemption list and workflow configuration to reflect the new filename.
802a1b5 to
d8a11d7
Compare
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 30.7s | $0.03 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 19.4s | $0.02 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 16.3s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 1m 6s | $0.04 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 21.1s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 43.8s | $0.02 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 44.4s | $0.03 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 15.3s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 38.4s | $0.01 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 2m 31s | $0.27 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 18.4s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 25.2s | $0.01 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 17.8s | $0.02 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 30.0s | $0.02 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.2s | $0.00 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 14.9s | $0.01 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 41.4s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 4m 11s | $0.27 |
| 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 2m 12s | $0.16 |
| 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 21.9s | $0.02 |
| 01_standalone_sdk/28_ask_agent_example.py | ✅ PASS | 36.9s | $0.02 |
| 01_standalone_sdk/29_llm_streaming.py | ✅ PASS | 38.3s | $0.02 |
| 01_standalone_sdk/30_tom_agent.py | ✅ PASS | 25.1s | $0.01 |
| 01_standalone_sdk/31_windows.py | ✅ PASS | 1m 1s | $0.05 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 2s | $0.03 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 2m 4s | $0.08 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 53.7s | $0.05 |
| 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ✅ PASS | 1m 33s | $0.03 |
✅ All tests passed!
Total: 28 | Passed: 28 | Failed: 0 | Total Cost: $1.28
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:641e720-pythonRun
All tags pushed for this build
About Multi-Architecture Support
641e720-python) is a multi-arch manifest supporting both amd64 and arm64641e720-python-amd64) are also available if needed