WIP: 1.0 #1421

yuyutaotao · 2025-11-04T07:24:21Z

No description provided.

* Initial plan * fix(cli): allow duplicate YAML files in config.yaml Co-authored-by: quanru <[email protected]> * fix(cli): deep clone YAML script to prevent mutation issues * fix(yaml): prevent mutation of flowItem by creating a new object for processing --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: quanru <[email protected]> Co-authored-by: quanruzhuoxiu <[email protected]>

….x (#1325) * refactor(core): remove non-OpenAI SDK support and upgrade to OpenAI 6.x This commit removes support for Anthropic SDK and Azure OpenAI, simplifying the codebase to use only the standard OpenAI SDK with OpenAI-style APIs. Changes: - Remove Anthropic SDK (@anthropic-ai/sdk) dependency - Remove Azure OpenAI specific code and @azure/identity dependency - Remove langsmith wrapper support - Remove proxy agent support (https-proxy-agent, socks-proxy-agent) - Upgrade OpenAI SDK from 4.81.0 to 6.3.0 - Simplify createChatClient function to only create standard OpenAI clients - Remove 'style' parameter from createChatClient return type - Remove all Anthropic-specific message handling code - Add openai 6.3.0 as devDependency to @midscene/shared Benefits: - Cleaner, more maintainable codebase - Reduced dependencies (removed 5 packages) - All AI providers can now be accessed through OpenAI-compatible APIs Breaking Changes: - Anthropic SDK mode no longer supported - Azure OpenAI specific configuration removed - MIDSCENE_LANGSMITH_DEBUG no longer supported - httpAgent/socksProxy removed from createChatClient 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(core): model provider documentation and remove Azure and Anthropic configurations * Apply suggestion from @Copilot Co-authored-by: Copilot <[email protected]> * feat(core): add proxy support for OpenAI client with HTTP and SOCKS configurations * feat(core): add qwen-vl specific configuration for high resolution images --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Copilot <[email protected]>

This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* chore(core): remove warning msg for gpt-4 * chore(core): remove dom-based locator

* chore(core): refine recorder loop * feat(core): update implementation of recorder

* refactor(core,web-integration,docs): rename API methods for clarity BREAKING CHANGE: Renamed aiAction() to aiAct() and logScreenshot() to recordToReport() for improved naming consistency. The aiAction() method is kept as deprecated for backward compatibility. Changes: - Renamed aiAction() to aiAct() across core and web-integration - Renamed logScreenshot() to recordToReport() - Updated all English and Chinese documentation - Updated code examples in README files - Updated Playwright fixture to support new method names - Added deprecation warning for aiAction() method - Updated all test files and examples This improves API consistency and clarity while maintaining backward compatibility through deprecated methods. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(yaml): add backward compatibility for aiAction method in YAML flow * fix(core): conditionally add httpAgent to OpenAI client options Fix TypeScript compilation error where httpAgent property doesn't exist in OpenAI 6.x ClientOptions type. Only include httpAgent when a proxy is configured, and use type assertion to bypass the strict type check. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

* chore(core): update implementation of insight * chore(core): refine error plan * chore(core): refine error plan * chore(core): split tasks into multiple parts * fix(core): fix ci

* chore(release): upgrade all packages to v1.0.0 - Bump version from 0.30.4 to 1.0.0 for all packages - Update Chrome extension manifest version to 0.136 - Update internal package dependencies to 1.0.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(release): add validation to prevent 1.x stable releases - Block publishing of 1.x versions with 'latest' tag - Allow publishing 1.x beta versions (prepatch) - Allow publishing stable versions for other major versions (0.x, 2.x, etc.) This ensures that 1.x releases can only be published as beta versions, preventing accidental stable releases while still allowing testing and pre-release distributions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

* refactor(core): remove unused getXpathsById method This method was not being used in the codebase. Removed: - Core implementation in shared/src/extractor/locator.ts - Export from shared/src/extractor/index.ts - Implementations in puppeteer/base-page.ts, chrome-extension/page.ts, and static/static-page.ts - All related unit tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(types): rename AndroidPullParam and AndroidLongPressParam to PullParam and LongPressParam --------- Co-authored-by: Claude <[email protected]>

…1341) * feat(core): support custom OpenAI client instances for observability Enable users to provide custom OpenAI client factory function through AgentOpt.createOpenAIClient, allowing integration with observability tools like langsmith and langfuse. Key changes: - Add CreateOpenAIClientFn type in @midscene/shared/env for creating custom OpenAI clients - Extend AgentOpt interface with optional createOpenAIClient callback - Pass callback through Agent -> ModelConfigManager -> IModelConfig - Inject createOpenAIClient during config initialization for better performance - Update createChatClient to use custom client factory when provided Benefits: - Users can wrap OpenAI clients with langsmith's wrapOpenAI() for tracing - Users can wrap with langfuse's observeOpenAI() for logging - Support different clients for different intents (planning, grounding, VQA, default) - Zero runtime overhead - injection happens during config initialization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): add unit tests for custom OpenAI client integration in ModelConfigManager and service-caller * Update packages/shared/tests/unit-test/env/modle-config-manager.test.ts Co-authored-by: Copilot <[email protected]> * refactor(core): remove unused MIDSCENE_API_TYPE constant from service-caller and types --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

* chore(ci): enable workflows for PRs targeting 1.0 branch Add 1.0 branch to pull_request triggers in CI and lint workflows to ensure PRs targeting the 1.0 branch run the same checks as PRs to main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * tests(shared, web-integration): update tests to use runner instead of executor and improve environment setup --------- Co-authored-by: Claude <[email protected]>

* docs(awesome): add midscene java sdk (#1324) * fix(core): support number type for aiInput value field (#1339) * fix(core): support number type for aiInput value field This change allows aiInput.value to accept both string and number types, addressing scenarios where: 1. AI models return numeric values instead of strings 2. YAML files contain unquoted numbers that parse as number type Changes: - Updated type definitions to accept string | number - Added Zod schema transformation to convert numbers to strings - Updated runtime validation to accept both types - Added explicit conversion in YAML player as fallback All conversions happen internally and are transparent to users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): update aiInput type signatures to accept number values Update the TypeScript method signatures for aiInput to accept string | number for the value parameter, matching the runtime implementation. Changes: - New signature: opt parameter now accepts { value: string | number } - Legacy signature: first parameter now accepts string | number - Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number - Type assertion updated from `as string` to `as string | number` This ensures type safety and allows users to pass number values directly without TypeScript errors, while maintaining backward compatibility with existing string-based usage. Fixes type errors in test cases that use number values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * fix(report): prevent sidebar jitter when expanding case selector (#1344) Fixed sidebar shifting 1-2 pixels when clicking to expand the playwright case selector. The issue was caused by adding a border only in the expanded state, causing a sudden height change. Solution: Added transparent border to the collapsed state, ensuring consistent height across both states. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * refactor(core): unify cache config parameters (#1346) Simplified `processCacheConfig` function signature from 3 to 2 parameters. Unified `fallbackId` and `cacheId` into single `cacheId` parameter. BREAKING CHANGE: processCacheConfig signature changed Changed from: processCacheConfig(cache, fallbackId, cacheId?) To: processCacheConfig(cache, cacheId) The cacheId parameter now serves dual purpose: 1. Fallback ID when cache is true or cache object lacks ID 2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env) Updated call sites: - packages/core/src/agent/agent.ts - packages/web-integration/src/playwright/ai-fixture.ts - packages/cli/src/create-yaml-player.ts (4 locations) Added comprehensive test coverage for legacy compatibility mode: - process-cache-config.test.ts: 18 tests passing - create-yaml-player.test.ts: 13 tests passing (6 new) - playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new) Benefits: - Simpler API with fewer parameters - Unified semantics for new and legacy use cases - Full backward compatibility maintained - Better test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * fix(core,web-integration): fix unit tests after merging main branch This commit fixes unit test failures that occurred after merging the main branch into the 1.0 branch. The issues were caused by temporal conflicts between commits that added new features and subsequent refactoring. Root Cause: - Commit 13b4f1d added aiInput number support with tests using 'executor' - Commit c9b385b refactored Executor → TaskRunner in the 1.0 branch - When main was merged, tests still referenced 'executor' but code used 'runner' Changes: 1. Fix YAML player aiInput number conversion (packages/core/src/yaml/player.ts): - Extract 'value' field separately to prevent spread override - Ensure number values are converted to strings via String(value) - Maintain backward compatibility for empty string handling 2. Fix test mock structure (packages/web-integration/tests/unit-test/ai-input-number-value.test.ts): - Update all mock objects from 'executor' to 'runner' - Aligns with TaskRunner API refactoring 3. Fix cache config test (packages/web-integration/tests/unit-test/playwright-ai-fixture-cache.test.ts): - Move vi.mock() before imports to ensure proper module hoisting - Fixes legacy mode environment variable checks 4. Add value conversion in agent.ts (optional improvement): - Explicitly convert number to string in aiInput method - Improves code clarity and test stability All tests now pass (195 passed, 1 skipped). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Claude <[email protected]>

* chore(core): update types of task executor * chore(core): update sleep tasks * chore(core): update types for planning * feat(core): update subTask flag

* chore(lint): fix linting and formatting issues - Fix useless switch case in modle-config-manager.test.ts - Format package.json files for consistency - Apply code formatting across core, agent, and related files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * chore(deps): update openai package to version 6.3.0 --------- Co-authored-by: Claude <[email protected]>

* feat(chrome-extension): enable hot reload for development This commit adds hot reload support for chrome-extension development, significantly improving the development experience. Main changes: - Add web-ext integration for automatic extension reloading - Add wait-for-build.js script to ensure build completes first - Update dev script to use concurrently for build watch + web-ext - Add web-ext-config.cjs for web-ext configuration To fix build stability during hot reload: - Replace npm-watch with rslib native watch mode in visualizer - Standardize dev/build:watch script relationship across packages - This prevents dist directory deletion during rebuilds The rslib native watch mode performs incremental builds without deleting the dist directory, preventing "Module not found" errors when chrome-extension references @midscene/visualizer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(chrome-extension): wait for JS bundles before starting web-ext The previous implementation only checked for static files (manifest.json, index.html) which are copied early in the build process. This caused web-ext to start before the JavaScript bundles were built, resulting in errors. Now we check for the actual build outputs: - dist/static/js/index.js - dist/static/js/popup.js This ensures web-ext only starts after Rsbuild has completed the full build. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * chore(deps): align Rsbuild plugin versions across workspace Update all Rsbuild plugins to use consistent versions: - @rsbuild/plugin-less: 1.5.0 - @rsbuild/plugin-node-polyfill: 1.4.2 - @rsbuild/plugin-react: 1.4.1 - @rsbuild/plugin-svgr: 1.2.2 - @rsbuild/plugin-type-check: 1.2.4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

* docs(awesome): add midscene java sdk (#1324) * fix(core): support number type for aiInput value field (#1339) * fix(core): support number type for aiInput value field This change allows aiInput.value to accept both string and number types, addressing scenarios where: 1. AI models return numeric values instead of strings 2. YAML files contain unquoted numbers that parse as number type Changes: - Updated type definitions to accept string | number - Added Zod schema transformation to convert numbers to strings - Updated runtime validation to accept both types - Added explicit conversion in YAML player as fallback All conversions happen internally and are transparent to users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): update aiInput type signatures to accept number values Update the TypeScript method signatures for aiInput to accept string | number for the value parameter, matching the runtime implementation. Changes: - New signature: opt parameter now accepts { value: string | number } - Legacy signature: first parameter now accepts string | number - Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number - Type assertion updated from `as string` to `as string | number` This ensures type safety and allows users to pass number values directly without TypeScript errors, while maintaining backward compatibility with existing string-based usage. Fixes type errors in test cases that use number values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * fix(report): prevent sidebar jitter when expanding case selector (#1344) Fixed sidebar shifting 1-2 pixels when clicking to expand the playwright case selector. The issue was caused by adding a border only in the expanded state, causing a sudden height change. Solution: Added transparent border to the collapsed state, ensuring consistent height across both states. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * refactor(core): unify cache config parameters (#1346) Simplified `processCacheConfig` function signature from 3 to 2 parameters. Unified `fallbackId` and `cacheId` into single `cacheId` parameter. BREAKING CHANGE: processCacheConfig signature changed Changed from: processCacheConfig(cache, fallbackId, cacheId?) To: processCacheConfig(cache, cacheId) The cacheId parameter now serves dual purpose: 1. Fallback ID when cache is true or cache object lacks ID 2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env) Updated call sites: - packages/core/src/agent/agent.ts - packages/web-integration/src/playwright/ai-fixture.ts - packages/cli/src/create-yaml-player.ts (4 locations) Added comprehensive test coverage for legacy compatibility mode: - process-cache-config.test.ts: 18 tests passing - create-yaml-player.test.ts: 13 tests passing (6 new) - playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new) Benefits: - Simpler API with fewer parameters - Unified semantics for new and legacy use cases - Full backward compatibility maintained - Better test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * release: v0.30.5 * docs(site): optimize v0.30 changelog with user-focused improvements (#1352) Improved the v0.30 changelog to be more user-centric and less promotional: - Reduced hyperbolic language ("comprehensive upgrade" → "improved", etc.) - Reorganized content structure with clearer user value sections - Added specific usage scenarios and examples for cache strategies - Enhanced mobile platform sections with iOS and Android subsections - Simplified technical descriptions to be more objective - Added cross-platform consistency section for ClearInput feature - Translated optimized content to English version These changes make the changelog more professional and easier for users to understand the actual benefits of the update. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * fix(ios): correct horizontal scroll direction and improve swipe implementation (#1358) * fix(ios): correct horizontal scroll direction and improve swipe implementation Fixed two issues with iOS horizontal scrolling: 1. **Corrected scroll direction semantics** - scrollLeft now swipes right (brings left content into view) - scrollRight now swipes left (brings right content into view) - This aligns with Android and Web scroll behavior where the direction indicates which content enters the viewport 2. **Improved swipe implementation** - Implemented W3C Actions API for better scroll support - Falls back to dragfromtoforduration if Actions API fails - Increased scroll distance from width/3 to width*0.7 (70%) to prevent bounce-back 3. **Fixed scrollUntilBoundary directions** - Corrected left/right swipe directions in boundary detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(ios): remove fallback from swipe method, use W3C Actions API only --------- Co-authored-by: Claude <[email protected]> * feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice (#1363) * fix(docs): add alwaysFetchScreenInfo parameter to AndroidDevice constructor documentation * feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice Configure AndroidDevice instance with alwaysFetchScreenInfo option set to true to ensure screen information is always fetched during device operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for consistency --------- Co-authored-by: Claude <[email protected]> * fix(core): handle ZodEffects and ZodUnion in schema parsing (#1359) * fix(core): handle ZodEffects and ZodUnion in schema parsing - Add support for ZodEffects (transformations) in getTypeName and getDescription - Add support for ZodUnion types with proper type display (type1 | type2) - Fixes "failed to parse Zod type" warning on first execution with caching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): add tests for descriptionForAction with ZodEffects and ZodUnion * chore(core): update test cases --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yutao <[email protected]> * feat(playground): implement task cancellation for Android/iOS playgrounds (#1355) * feat(playground): implement task cancellation for Android/iOS playgrounds This PR implements task cancellation functionality for Android and iOS playgrounds using a singleton + recreation pattern. When users clicked the "Stop" button in Android/iOS playground, the task continued to execute and control the device via ADB commands. This was because: - Agent instances were global singletons created at server startup - The /cancel endpoint only deleted progress tips without stopping execution - There was no mechanism to interrupt ongoing tasks Implemented a singleton + recreation pattern: - PlaygroundServer now accepts factory functions instead of instances - Added task locking mechanism (currentTaskId) to prevent concurrent tasks - When cancel is triggered, the agent is destroyed and recreated - Device operations stop immediately as destroyed agents reject new commands 1. **PlaygroundServer** (packages/playground/src/server.ts) - Added factory function support for page and agent creation - Added `recreateAgent()` method to destroy and recreate agent - Added `currentTaskId` to track running tasks - Enhanced `/execute` endpoint with task conflict detection - Enhanced `/cancel` endpoint to recreate agent on cancellation - Backward compatible with existing instance-based usage 2. **Android Playground** (packages/android-playground/src/bin.ts) - Updated to use factory pattern for server creation - Each recreation creates fresh AndroidDevice and AndroidAgent instances 3. **iOS Playground** (packages/ios/src/bin.ts) - Updated to use factory pattern for server creation - Each recreation creates fresh IOSDevice and IOSAgent instances - Added test script `test-cancel-android.sh` for automated testing - Manual testing confirmed device operations stop when cancel is triggered ``` User clicks Stop ↓ Frontend calls /cancel/:requestId ↓ Server checks if current running task ↓ Call recreateAgent() ├─ Destroy old agent (agent.destroy()) ├─ Destroy old device (device.destroy()) ├─ Create new device (pageFactory()) └─ Create new agent (agentFactory(device)) ↓ Clear task lock and progress tips ↓ Device stops operations ✅ ``` - ✅ Simple implementation (minimal code changes) - ✅ Effective cancellation (destroy() immediately sets destroyed flag) - ✅ Backward compatible (still accepts instances) - ✅ Natural serialization (one task at a time per device) ```bash pnpm run android:playground ./test-cancel-android.sh ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(page): ensure keyboard actions return promises for better async handling * refactor(playground): update PlaygroundServer to use agent factories and simplify server creation * fix(ios): round coordinates for tap and swipe actions to improve accuracy * fix(android): round coordinates in scrolling and gesture methods for improved accuracy * refactor(playground): simplify PlaygroundServer instantiation and improve code readability --------- Co-authored-by: Claude <[email protected]> * fix(yaml): skip environment variable interpolation in YAML comments (#1361) * Initial plan * fix(yaml): skip environment variable interpolation in YAML comments * style(yaml): apply biome linting fixes Co-authored-by: quanru <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: quanru <[email protected]> * fix(core): handle null data in WaitFor and support array keyName in KeyboardPress (#1354) * fix(core): handle null data in WaitFor and support array keyName in KeyboardPress This commit fixes two critical bugs: 1. **Fix null data handling in task execution** - Fixed TypeError when AI extract() returns null for WaitFor operations - Added null/undefined check before accessing data properties - WaitFor operations now return false when data is null (condition not met) - Other operations (Assert, Query, String, Number) return null when data is null - Location: src/agent/tasks.ts:936-938 2. **Add array support for keyName in KeyboardPress** - Updated actionKeyboardPressParamSchema to accept string | string[] - Allows key combinations like ['Control', 'A'] for keyboard shortcuts - Maintains backward compatibility with string format - Updated type definitions in aiKeyboardPress method - Locations: - src/device/index.ts:197-199 - src/agent/agent.ts:575-622 **Test Coverage:** - Added comprehensive unit tests for null data handling (8 test cases) - Added unit tests for keyName array validation (7 test cases) - All tests verify edge cases and expected behavior Fixes issue where executor crashed with: "TypeError: Cannot read properties of null (reading 'StatementIsTruthy')" And fixes parameter validation error: "Invalid parameters for action KeyboardPress: Expected string, received array" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ios,android): handle array keyName in KeyboardPress action - Updated iOS and Android device implementations to handle keyName as string | string[] - For mobile devices, array keys are joined with '+' (e.g., ['Control', 'A'] becomes 'Control+A') - This fixes TypeScript compilation errors in iOS and Android packages - Maintains backward compatibility with string format Related to the KeyboardPress array support added in the previous commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(ios,android): improve KeyboardPress array handling - Remove incorrect join('+') approach that doesn't work on mobile devices - Use last key from array instead (e.g., ['Control', 'A'] → 'A') - Add clear warning messages when array input is used on mobile platforms - Mobile devices don't support keyboard combinations, this is a graceful degradation This makes the behavior more predictable and provides better feedback to developers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): fix TaskExecutor constructor arguments in null data tests - Fixed TaskExecutor constructor call to match actual signature - Constructor requires (interface, insight, options) instead of (insight, interface) - All 8 tests now passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ios,android): improve logging for unsupported key combinations in device input * fix(core): handle null data in WaitFor and improve keyName parameter description This commit fixes the null data handling bug and improves the KeyboardPress parameter description. ## Changes: ### 1. Fix null data handling in task execution - Fixed TypeError when AI extract() returns null for WaitFor operations - Added null/undefined check before accessing data properties (tasks.ts:936-938) - WaitFor operations now return false when data is null (condition not met) - Other operations (Assert, Query, String, Number) return null when data is null ### 2. Improve KeyboardPress parameter description - Reverted keyName to only accept string type (not array) - Added clear description: "Use '+' for key combinations, e.g., 'Control+A', 'Shift+Enter'" - This provides better guidance to AI for generating key combinations - Simplified iOS/Android implementations (no special array handling needed) ### 3. Test coverage - Added 8 unit tests for null data handling - Updated KeyboardPress tests to validate string-only format - Added test for key combination strings (e.g., 'Control+A') - Added test to verify arrays are rejected - Fixed unused variable warning in test file ## Fixed Issues: **Issue 1:** Executor crashes with null data ``` TypeError: Cannot read properties of null (reading 'StatementIsTruthy') ``` **Issue 2:** Unclear how to specify key combinations - Now clearly documented in parameter description with examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(core): align KeyboardPress action description with parameter schema Updated the KeyboardPress action description to explicitly mention support for key combinations (e.g., "Control+A", "Shift+Enter"), making it consistent with the keyName parameter description that already documented this functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): handle null and undefined data in WaitFor output processing --------- Co-authored-by: Claude <[email protected]> * perf(android): optimize clearInput performance by batching keyevents (#1366) * perf(android): optimize clearInput performance by batching keyevents Replace serial keyevent(67) calls with clearTextField() method from appium-adb library, which batches all keyevents into a single shell command. Performance improvement: - Before: ~50 seconds (100 sequential shell calls, ~500ms each) - After: ~1-2 seconds (single batched shell command) - Speedup: 25-50x Changes: - Use adb.clearTextField(100) instead of repeat(() => adb.keyevent(67)) - Add clearTextField mock to unit tests for compatibility All 75 unit tests passing, build successful. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): include device pixel ratio in size calculation for AndroidDevice --------- Co-authored-by: Claude <[email protected]> * release: v0.30.6 * fix(tests): enhance null data handling tests by adding uiContext parameter --------- Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: yutao <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: quanru <[email protected]>

…ication (#1365) * feat(bridge-mode): add remote access support for cross-machine communication This commit implements remote access capability for Bridge Mode, enabling communication between server and client on different machines. ## Changes ### Core Features - Server side: Added `allowRemoteAccess` option to bind server to 0.0.0.0 - Server side: Added `host` and `port` options for custom configuration - Client side: Added server URL configuration UI in Chrome extension - Configuration priority: host > allowRemoteAccess > default (127.0.0.1) ### Modified Files - packages/web-integration/src/bridge-mode/: - common.ts: Added getBridgeServerHost() helper function - io-server.ts: Modified to support custom host binding - agent-cli-side.ts: Added remote access options to constructor - page-browser-side.ts: Added server endpoint parameter support - apps/chrome-extension/src/: - extension/bridge/index.tsx: Added server URL configuration UI - extension/bridge/index.less: Added styles for configuration section - utils/bridgeConnector.ts: Support custom server endpoint - packages/web-integration/tests/: - ai/bridge/remote-access.test.ts: Added comprehensive tests - unit-test/bridge/io.test.ts: Updated tests for new API ### Documentation - Updated docs in apps/site/docs/{en,zh}/bridge-mode-by-chrome-extension.mdx - Added remote access configuration section with examples - Added security warnings for remote access usage ## API Changes New constructor options: - allowRemoteAccess: Enable remote access - host: Custom host (optional) - port: Custom port (optional) ## Backward Compatibility - All existing code works without modification - Default behavior unchanged (localhost only) - All unit tests passing ## Security - Default remains secure (127.0.0.1 only) - Remote access requires explicit opt-in - Documentation includes security warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(bridge): resolve race condition in server initialization Fix the 'xhr poll error' by ensuring all Socket.IO middleware and event handlers are set up BEFORE calling httpServer.listen(). This eliminates the race condition where clients could attempt to connect before the server was fully ready. Changes: - Moved Socket.IO middleware setup before httpServer.listen() - Moved Socket.IO connection handlers before httpServer.listen() - Moved httpServer.listen() to the end of initialization sequence Fixes failing unit tests in packages/web-integration/tests/unit-test/ bridge/io.test.ts (all 15 tests now passing) * fix(web-integration): add delay to ensure Socket.IO is fully ready in server initialization * fix(bridge-server): improve HTTP server setup and event handling order * fix(bridge): improve server URL handling and localStorage management * feat(bridge): enhance server configuration UI with expandable section and improved styling * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

#1377) ## Problem The previous nano-staged configuration had two issues: 1. Used `biome check .` which checked the entire project instead of only staged files 2. nano-staged doesn't automatically re-stage fixed files, causing commits to fail ## Solution Switched to lint-staged which: - Automatically passes only staged files to biome - Re-stages files after fixes are applied - More mature and widely adopted ## Changes - Replaced nano-staged with lint-staged in pre-commit hook - Updated biome command to remove project-wide checks - Added lint-staged as dev dependency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* feat(yaml): support all device options in YAML configuration This PR enables YAML scripts to use all Android and iOS device options by centralizing device option types and ensuring runtime configuration propagation. Changes: - Created packages/core/src/device/device-options.ts to centralize all device option type definitions (AndroidDeviceOpt, IOSDeviceOpt) - Updated MidsceneYamlScriptAndroidEnv and MidsceneYamlScriptIOSEnv to extend device options using Omit<> to exclude programmatic fields - Fixed runtime configuration passing in create-yaml-player.ts to forward all YAML config options to device constructors - Simplified agent creation functions to pass entire options object instead of manually listing each parameter YAML scripts can now configure: Android: - androidAdbPath, remoteAdbHost, remoteAdbPort - imeStrategy, displayId, usePhysicalDisplayIdForScreenshot - screenshotResizeScale, alwaysFetchScreenInfo - autoDismissKeyboard, keyboardDismissStrategy iOS: - deviceId, useWDA, wdaPort, wdaHost - autoDismissKeyboard 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(yaml): add unit tests for device options propagation Add comprehensive unit tests to verify that all device options are correctly passed from YAML configuration to device constructors. Tests include: - Android device options propagation from YAML to agentFromAdbDevice - iOS device options propagation from YAML to agentFromWebDriverAgent - Type definitions for AndroidDeviceOpt and IOSDeviceOpt - YAML environment types (MidsceneYamlScriptAndroidEnv, MidsceneYamlScriptIOSEnv) - Validation that customActions is excluded from YAML types - IME strategy and keyboard dismiss strategy type validations - Minimal and full configuration scenarios All 31 tests passing (17 in CLI, 14 in Core). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): ensure empty object is passed when opts is undefined Fix failing unit tests by ensuring an empty object is passed to AndroidDevice and IOSDevice constructors when opts is undefined, maintaining backward compatibility with existing tests. Changes: - Updated agentFromAdbDevice to pass opts || {} to AndroidDevice - Updated agentFromWebDriverAgent to pass opts || {} to IOSDevice This ensures the constructors always receive an object instead of undefined, which is what the existing tests expect. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(device-options): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for clarity * docs(site): update Android and iOS sections to include all configuration options from their respective constructors --------- Co-authored-by: Claude <[email protected]>

Update the task type display names in report sidebar and detail views: - Change "Insight / Query" and "Insight / Assert" to "Insight" - Change "Action / {subType}" to "Action Space / {subType}" - Show "Planning / Plan" instead of just "Planning" - Keep other task types unchanged (e.g., "Planning / Locate") This provides clearer and more consistent naming for different task types in the report UI, making it easier to understand the task hierarchy and categorization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1381) This change improves code consistency by using clonedYamlScript.agent instead of mixing yamlScript.agent and clonedYamlScript for other properties throughout the agent initialization code. Changes: - Use clonedYamlScript.agent consistently across all agent types (puppeteer, bridge mode, Android, iOS, and interface) - This ensures all configuration comes from the same cloned instance, preventing potential mutation issues when the same YAML file is executed multiple times - Added comprehensive unit tests to verify aiActionContext is properly passed to Android, iOS, and bridge mode agents This is a code quality improvement that makes the codebase more maintainable and aligns with the original design intent of using structuredClone to isolate each ScriptPlayer instance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1375) * refactor(env): modernize model configuration environment variables This PR refactors the model configuration system with improved naming conventions and better type safety while maintaining backward compatibility. Key Changes: 1. Environment Variable Naming Convention Updates: - Renamed OPENAI_* → MODEL_* for public API variables * OPENAI_API_KEY → MODEL_API_KEY (deprecated, backward compatible) * OPENAI_BASE_URL → MODEL_BASE_URL (deprecated, backward compatible) - Renamed MIDSCENE_*_VL_MODE → MIDSCENE_*_LOCATOR_MODE across all intents * MIDSCENE_VL_MODE → MIDSCENE_LOCATOR_MODE * MIDSCENE_VQA_VL_MODE → MIDSCENE_VQA_LOCATOR_MODE * MIDSCENE_PLANNING_VL_MODE → MIDSCENE_PLANNING_LOCATOR_MODE * MIDSCENE_GROUNDING_VL_MODE → MIDSCENE_GROUNDING_LOCATOR_MODE - Updated all internal MIDSCENE_*_OPENAI_* → MIDSCENE_*_MODEL_* * MIDSCENE_VQA_OPENAI_API_KEY → MIDSCENE_VQA_MODEL_API_KEY * MIDSCENE_PLANNING_OPENAI_API_KEY → MIDSCENE_PLANNING_MODEL_API_KEY * MIDSCENE_GROUNDING_OPENAI_API_KEY → MIDSCENE_GROUNDING_MODEL_API_KEY * (and corresponding BASE_URL variables) 2. Type System Improvements: - Split TModelConfigFn into public and internal types - Public API (TModelConfigFn) no longer exposes 'intent' parameter - Internal type (TModelConfigFnInternal) maintains intent parameter - Users can still optionally use intent parameter via type casting 3. Backward Compatibility: - Maintained compatibility for documented public variables (OPENAI_API_KEY, OPENAI_BASE_URL) - New variables take precedence, fallback to legacy names if not set - Only public documented variables are deprecated, internal variables renamed directly 4. Updated Files: - packages/shared/src/env/types.ts - Type definitions and constants - packages/shared/src/env/constants.ts - Config key mappings - packages/shared/src/env/decide-model-config.ts - Compatibility logic - packages/shared/src/env/model-config-manager.ts - Type casting implementation - packages/shared/src/env/init-debug.ts - Debug variable updates - All test files updated to use new variable names Testing: - All 24 model-config-manager tests passing - Overall test suite: 241 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update packages/shared/src/env/constants.ts Co-authored-by: Copilot <[email protected]> * test(env): add comprehensive backward compatibility tests for OPENAI_* variables - Added test suite to verify MODEL_API_KEY/MODEL_BASE_URL take precedence - Added test to ensure OPENAI_API_KEY/OPENAI_BASE_URL still work as fallback - Fixed compatibility logic to prioritize new variables over legacy ones - All 13 tests passing, including 5 new backward compatibility tests Test coverage: ✓ Using only legacy variables (OPENAI_API_KEY) ✓ Using only new variables (MODEL_API_KEY) ✓ Mixing new and legacy variables (new takes precedence) ✓ Individual precedence for API_KEY and BASE_URL 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(test): reset MIDSCENE_CACHE in beforeEach to avoid .env interference The test 'should return the correct value from override' was failing because .env file sets MIDSCENE_CACHE=1. This was polluting the test environment and causing the test to expect false but receive true. Fixed by explicitly resetting MIDSCENE_CACHE to empty string in beforeEach. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(site): update environment variable names and add advanced configuration examples for agents --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

* refactor(core): remove tree info in uiContext * chore(core): fix lint * chore(core): remove dom-based locator * fix(core): test cases * chore(core): fix lint * fix(core): test cases

* feat(core): update signature of warp-openai * docs(site): update createOpenAIClient API documentation Update the documentation for createOpenAIClient to reflect the new signature: - Changed from factory function to wrapper function - Now receives base OpenAI instance and options - Returns Promise<OpenAI | undefined> - Updated examples to show async wrapper pattern - Removed unnecessary OpenAI import from examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: quanruzhuoxiu <[email protected]> Co-authored-by: Claude <[email protected]>

@deprecated

…nt variables (#1388) Add backward compatibility support for legacy MIDSCENE_OPENAI_* environment variables: - MIDSCENE_OPENAI_INIT_CONFIG_JSON (now MIDSCENE_MODEL_INIT_CONFIG_JSON) - MIDSCENE_OPENAI_HTTP_PROXY (now MIDSCENE_MODEL_HTTP_PROXY) - MIDSCENE_OPENAI_SOCKS_PROXY (now MIDSCENE_MODEL_SOCKS_PROXY) Changes: - Add deprecated constants to types.ts with @deprecated tags - Add legacy variables to MODEL_ENV_KEYS for overrideAIConfig support - Update DEFAULT_MODEL_CONFIG_KEYS_LEGACY to use legacy variable names - Implement priority fallback logic in decide-model-config.ts (new variables take precedence) - Update documentation (zh/en model-provider.mdx) with deprecation notices All 139 tests pass, confirming backward compatibility works correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

) * feat(android): add screenshot polling fallback for remote devices Implement automatic fallback to screenshot polling mode when connecting to remote Android devices (IP:Port format), since scrcpy cannot connect to remote adb devices. Changes: - Refactor ScreenshotViewer to shared component in @midscene/visualizer with function-based props - Add /api/screenshot endpoint in ScrcpyServer using adb screencap - Add device type detection to distinguish local vs remote devices - Conditionally render ScrcpyPlayer (real-time) for local devices or ScreenshotViewer (polling) for remote devices - Update playground app to use new shared ScreenshotViewer component 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): import ExecutionTaskInsightLocate from types module Fix TypeScript build error by importing ExecutionTaskInsightLocate directly from @midscene/core/types instead of the main export. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): define local ExecutionTaskInsightLocate interface Define ExecutionTaskInsightLocate as a local interface instead of importing from @midscene/core to resolve TypeScript build errors. This type is not properly exported from the core package's type declarations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(android): use PlaygroundServer screenshot API instead of duplicating in ScrcpyServer Remove duplicate screenshot implementation from ScrcpyServer and use the existing PlaygroundServer /screenshot endpoint which already calls AndroidDevice.screenshotBase64(). This eliminates code duplication and leverages the existing infrastructure. Changes: - Remove /api/screenshot endpoint from ScrcpyServer - Update App.tsx to call PlaygroundServer's /screenshot endpoint (port 9412) - Also use PlaygroundServer's /interface-info endpoint for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

…nents (#1392) This change consolidates all PlaygroundSDK creation logic for report components into a single shared utility module. Changes: - Created `apps/report/src/utils/report-playground-utils.ts` with `getReportPlaygroundSDK(serviceMode, agent?)` function - Removed duplicate `getPlaygroundSDK` implementations from playground.tsx and playground/index.tsx - Updated open-in-playground/index.tsx to use the shared function - Removed unnecessary `createReportPlaygroundSDK` wrapper function - All report components now use `PLAYGROUND_SERVER_PORT` constant from shared package Benefits: - Single source of truth for PlaygroundSDK creation in report components - Static report files always connect to localhost:5800 - Reduced code duplication and improved maintainability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* refactor(core): rename Insight class to Service This is a comprehensive refactoring that renames the Insight class and all related types to Service for better semantic clarity. Changes: - Renamed directories: insight/ -> service/ - Renamed test files: insight.test.ts -> service.test.ts - Updated 50+ type definitions - Modified 18+ source files - Synchronized all test files - Updated external package dependencies Core updates: - Class: Insight -> Service - Interface: InsightOptions -> ServiceOptions - All InsightX types -> ServiceX types - String literal 'Insight' -> 'Service' Affected files: - src/index.ts, src/yaml.ts, src/task-runner.ts - src/agent/*.ts (agent, tasks, task-builder, ui-utils) - tests/utils.ts and all test files - External: chrome-extension, evaluation, report Verification: - TypeScript: 0 errors - Lint: 530 files passed - Build: successful (341.1 kB) 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]> * fix(visualizer): update Insight references to Service - Updated ExecutionTaskInsightLocate to ExecutionTaskServiceLocate - Changed task.type check from 'Insight' to 'Service' - Renamed insightTask variable to serviceTask for consistency * fix(report): update Insight references to Service - Updated ExecutionTaskInsightLocate to ExecutionTaskServiceLocate in sidebar, detail-side, and detail-panel components - Changed task.type checks from 'Insight' to 'Service' - Updated ExecutionTaskInsightAssertion to ExecutionTaskServiceAssertion - Ensures report UI displays Service tasks correctly * chore(tests): update comments from Insight to Service * fix(tests): change task type from 'Insight' to 'Service' in tests - Updated aiaction-cacheable.test.ts - Updated page-task-executor-waitFor.test.ts - Completes the Insight to Service refactoring * fix(tests): update test expectations from 'Insight' to 'Service' - Updated task-builder.test.ts expectations - Updated page-task-executor-rightclick.test.ts expectations - Fixes CI test failures * refactor(core): use 'Insight' for ExecutionTask types Keep Service class name but restore ExecutionTask type to 'Insight' for consistency with UI display requirements. Changes: - ExecutionTaskType: 'Service' → 'Insight' - All ExecutionTaskService* types → ExecutionTaskInsight* - Runtime checks: task.type === 'Service' → task.type === 'Insight' - ui-utils.ts: Removed special handling for Query/Assert subtypes to display "Insight / Query" and "Insight / Assert" correctly Type display now follows the expected pattern: - Planning / Plan - Planning / Locate - Action Space / {interface} - Insight / Query - Insight / Assert - Insight / Locate Files modified: - packages/core/src/types.ts - packages/core/src/agent/*.ts - packages/core/src/task-runner.ts - packages/visualizer/src/utils/replay-scripts.ts - apps/report/src/components/**/*.tsx - All test files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

Fixed ambiguous descriptions about sequential vs parallel execution: - Updated --files parameter description to clearly state that files execute sequentially by default (when --concurrent=1) and can run concurrently with --concurrent parameter - Removed misleading "run in parallel" text from example that doesn't use --concurrent parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Add explicit error throwing for failed Assert tasks with detailed assertion failure messages including the AI's thought process. This change brings the 1.0 branch in line with the main branch commit 4761a6c, ensuring that Assert tasks fail explicitly when the AI cannot verify the condition, rather than silently returning null values. Changes: - Add error throwing for failed Assert tasks in tasks.ts - Update test to expect error instead of null output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* docs(core): update api reference doc * docs(core): fix all dead links * docs(core): fix all dead links

…guration (#1456) * feat(shared): introduce MIDSCENE_PLANNING_STYLE for unified config Replace multiple MIDSCENE_USE_* environment variables with a single MIDSCENE_PLANNING_STYLE parameter to simplify model configuration. ## Changes ### New Environment Variable - Added MIDSCENE_PLANNING_STYLE with supported values: - default (equivalent to qwen3) - qwen2.5, qwen3 - doubao-1.5, doubao-1.6 - ui-tars-1.0, ui-tars-1.5 - gemini ### Core Features 1. Auto-inference from model name: Automatically detect planning style from model name if not configured 2. Legacy compatibility: Support old MIDSCENE_USE_* variables with deprecation warnings 3. Conflict detection: Error when both new and legacy variables are set 4. Comprehensive validation: Validate planning style values and provide clear error messages ### Implementation Details - Added type definitions in types.ts - Implemented parsing logic in parse.ts: - inferPlanningStyleFromModelName() - convertPlanningStyleToVlMode() - parsePlanningStyleFromEnv() - Integrated into decide-model-config.ts for planning intent - All warnings output via console.warn() ### Testing - Added 39 new test cases covering: - Model name inference - Planning style conversion - Legacy variable compatibility - Error handling - All planning style values - Updated existing tests - All 162 tests passing ## Migration Guide Before (deprecated): MIDSCENE_USE_QWEN3_VL=1 MIDSCENE_USE_VLM_UI_TARS=DOUBAO-1.5 After (recommended): MIDSCENE_PLANNING_STYLE=qwen3 MIDSCENE_PLANNING_STYLE=ui-tars-1.5 Old environment variables still work with deprecation warnings. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(shared): resolve TypeScript type error in convertPlanningStyleToVlMode - Added non-null assertion for vlMode property - Explicitly destructure parsed result instead of using spread operator - vlMode is guaranteed to be non-undefined since vlModeRaw is always valid Fixes build error: Type 'TVlModeTypes | undefined' is not assignable to type 'TVlModeTypes' * fix(shared): correct UI-TARS model inference priority logic - Prioritize ui-tars check before doubao - ui-tars + 1.5 → vlm-ui-tars-doubao-1.5 - ui-tars + 1.0 → vlm-ui-tars - ui-tars (no version) → vlm-ui-tars-doubao (Volcengine deployment) - doubao (non-UI-TARS) → doubao-vision This ensures models deployed on Volcengine are correctly identified. * refactor(shared): simplify planning style parsing by removing model name inference * fix(tests): clear legacy environment variables before model config tests * Update packages/shared/src/env/decide-model-config.ts Co-authored-by: Copilot <[email protected]> * Update packages/shared/src/env/parse.ts Co-authored-by: Copilot <[email protected]> * Update packages/shared/src/env/parse.ts Co-authored-by: Copilot <[email protected]> * Update packages/shared/src/env/decide-model-config.ts Co-authored-by: Copilot <[email protected]> * refactor(parse): add type guard for planning style validation and improve error message * refactor(decide-model-config): consolidate vlMode and uiTarsVersion parsing logic for planning intent * refactor(model-config): migrate from MIDSCENE_USE_... to MIDSCENE_PLANNING_STYLE for model configuration * refactor(docs): update model configuration and strategy documentation for clarity * refactor(env): rename planning style references to model family for consistency * refactor(parse): update references from 'qwen-vl' to 'qwen2.5-vl' for consistency * refactor(core): update references from qwen-vl to qwen2.5-vl for consistency across the codebase * refactor(model-config): update model family references from 'qwen-vl' to 'qwen2.5-vl' for consistency --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

* feat(site): add responsive design and upgrade to Tailwind CSS v4 This commit adds comprehensive responsive design to the custom homepage and upgrades Tailwind CSS from v3 to v4.1.11. Key changes: - Upgrade Tailwind CSS to v4.1.11 with @tailwindcss/postcss plugin - Add mobile-first responsive design to Banner component - Add mobile-first responsive design to FeatureSections component - Implement responsive navigation bar with media queries - Optimize button layout for single-line text display Responsive breakpoints: - Mobile: < 768px (base styles) - Tablet: >= 768px (md: prefix) - Desktop: >= 1024px (lg: prefix) Banner responsive features: - Adaptive min-height: 400px (mobile) → 664px (desktop) - Responsive padding: 20px (mobile) → 40px (desktop) - Scalable typography: 32px (mobile) → 80px (desktop) for title - Flexible button layout: stacked (mobile) → horizontal (desktop) FeatureSections responsive features: - Adaptive section spacing: 48px (mobile) → 120px (desktop) - Responsive card layout: single column (mobile) → multi-column (desktop) - Scalable card heights: 120px (mobile) → 160px (desktop) - Flexible text sizes: 14px (mobile) → 16px (desktop) Navigation responsive updates: - Responsive padding: 20px (mobile) → 40px (desktop) - Consistent alignment with page content across all screen sizes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(theme): add missing keys for Banner and FeatureSections components --------- Co-authored-by: Claude <[email protected]>

…1471) This commit refactors the Banner and FeatureSections components to use Tailwind CSS's built-in dark mode modifier instead of the useDark() hook from @rspress/core/runtime. Changes: - Remove useDark import and usage from both components - Replace conditional classNames with Tailwind's dark: modifier - Remove unnecessary key props that were based on dark state - Simplify component logic by relying on Tailwind's dark mode configuration Benefits: - Cleaner, more maintainable code - Better performance (no runtime JS for dark mode detection) - Consistent with Tailwind CSS best practices - Reduced bundle size by removing unused hook dependency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Add JSON normalization to handle leading/trailing whitespace in LLM output that causes action type lookup failures. This fix addresses the issue where LLM-generated action plans contain: - Leading/trailing spaces in object keys (e.g., " prompt " instead of "prompt") - Leading/trailing spaces in string values (e.g., " Tap" instead of "Tap") Solution: - Added normalizeJsonObject() function in safeParseJson() - Recursively trims all object keys and string values - Works with nested objects, arrays, and all vlMode types - Added 13 comprehensive test cases Fixes #1435 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1473) * refactor(ui): improve dark mode consistency and detail panel layout - Unify dark mode background colors across components (#141414) - Update detail-side layout with better alignment: * Replace arrow icons with Ant Design icons (RightOutlined/DownOutlined) * Align details text with info-tab text at 20px from left * Center arrow icon within 20px left spacing * Adjust info-tabs height to 46px - Improve dark mode text colors for better readability: * Update meta-key color to #9599A6 * Update sidebar title color to #D1D3DB * Update segmented selected item color to #F5F9FD - Apply consistent dark mode styling across visualizer components 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(theme): update dark mode colors for detail panel and detail side --------- Co-authored-by: Claude <[email protected]>

* feat(shared): unify midscene family env * fix(web-integration): add MIDSCENE_MODEL_FAMILY to test mocks Update test mock configurations to include MIDSCENE_MODEL_FAMILY to ensure planning-related tests pass with the new unified naming. Also update the error message in model-config-manager to reference MIDSCENE_MODEL_FAMILY instead of the deprecated MIDSCENE_PLANNING_VL_MODE. Fixes test failures: - PageAgent RightClick > should handle aiRightClick with locate options - PageAgent RightClick > should be supported in ai method with rightClick - aiInput with number value tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

- Update all model configuration examples to use the new 4-parameter format: - MIDSCENE_MODEL_BASE_URL - MIDSCENE_MODEL_API_KEY - MIDSCENE_MODEL_NAME - MIDSCENE_MODEL_FAMILY - Replace deprecated OPENAI_API_KEY/OPENAI_BASE_URL references - Add links to model strategy documentation for more details - Update both Chinese and English documentation - Fix: add missing dayjs dependency to @midscene/core 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Add three new navigation actions (Navigate, Reload, GoBack) to the Web Action Space for browser navigation control. **New Actions:** - **Navigate**: Navigate to a URL in the current tab - Parameter: `url` (string) - **Reload**: Reload the current page - **GoBack**: Navigate back in browser history **Implementation:** - Import zod from @midscene/core instead of adding new dependency - Add navigate, reload, goBack methods to Page class in base-page.ts - Add method declarations to AbstractWebPage for type safety - Actions directly call page methods, avoiding code duplication - All actions include proper error handling for unsupported page types - Support both Puppeteer and Playwright page types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Add navigate, reload, and goBack methods to ChromeExtensionProxyPage to support navigation actions in Chrome Extension environment. **Implementation:** - navigate(url): Uses chrome.tabs.update to navigate to URL - reload(): Uses chrome.tabs.reload to refresh the page - goBack(): Uses chrome.tabs.goBack to go back in history - All methods wait for network idle after operation This completes the navigator actions support for all web page types: - Puppeteer - Playwright - Chrome Extension 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

This commit restores the `aiAction` fixture for Playwright integration while marking it as deprecated. Users are encouraged to use `aiAct` instead. Changes: - Add `aiAction` fixture implementation - Add `aiAction` to PlayWrightAiFixtureType - Mark `aiAction` as deprecated with JSDoc comments - Point users to use `aiAct` as the recommended alternative 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Add support for displaying timeout test status in the report overview component: - Add timedOut counter and timedOutTests tracking in report overview statistics - Handle "timedOut" and "interrupted" status in test case filtering - Add new "Timeout" stats card with orange color (#ff8c00 in light mode, #ffa940 in dark mode) - Update PlaywrightTaskAttributes type to use strict union type for test status - Update playwright case selector to show "Timeout" filter option - Fix App.tsx type error by setting undefined as default status value 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1485) * feat(playground): add device options configuration for Android/iOS This commit implements device-specific configuration options in the playground UI, allowing users to customize device behavior such as keyboard handling and IME strategy. Changes: - Add device options state management with localStorage persistence - Create UI controls for Android-specific options (imeStrategy, autoDismissKeyboard, keyboardDismissStrategy, alwaysRefreshScreenInfo) - Create UI controls for iOS-specific options (autoDismissKeyboard) - Extend execution pipeline to pass deviceOptions from frontend to backend - Update agent.interface.options on the server side when deviceOptions are received - Optimize parameter flattening to avoid delete operator performance issues Technical implementation: - Frontend: Store device options in Zustand with localStorage sync - SDK: Include deviceOptions in remote execution adapter payload - Server: Update agent.interface.options to apply settings globally - This ensures all actions (including those called by aiAct) use the updated options Fixes #1282 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(playground): add dynamic device type detection for iOS/web playground The universal playground app now detects the device type from the connected server's /interface-info API and displays device-specific configuration options accordingly. This ensures that iOS playground users can see and configure iOS device options (autoDismissKeyboard), while web users see no device-specific options. Related to #1282 * fix(ios): improve keyboard dismissal to prevent accidental UI interactions The previous implementation used a swipe down gesture at a fixed screen position (1/3 from top) which could accidentally click on search results or other UI elements that appeared after text input. Changes: - Use WDA's dismissKeyboard API as the primary method (more reliable) - Fall back to safer swipe gesture (from bottom up) if API fails - Increase wait time from 300ms to 500ms for UI stability - Update autoDismissKeyboard documentation to reflect default behavior Technical details: - WDA API tries common keyboard button names: return, done, go, search, etc. - Swipe fallback uses safer coordinates: from 90% height to 50% height - This prevents accidental taps on UI elements in the upper portion of screen Related to #1282 * feat(screenshot-viewer): add screenshot viewer component with styles and functionality * fix(tests): enhance keyboard dismissal tests to simulate failure scenarios --------- Co-authored-by: Claude <[email protected]>

) Fixed an issue where screenshots from uiContext.screenshotBase64 were not displayed when activeTask.recorder was empty or undefined. The screenshot extraction logic was previously nested inside the recorder length check, which prevented uiContext screenshots from being shown when there were no recorder entries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* feat(core): add automatic LangSmith and Langfuse integration - Add MIDSCENE_LANGSMITH_DEBUG and MIDSCENE_LANGFUSE_DEBUG env variables - Auto-wrap OpenAI client when env variables are enabled - Users only need to install the package and set env variable - No more manual createOpenAIClient code required 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): use variable to prevent bundler static analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(site): add auto-integration guide for LangSmith and Langfuse - Add new sections explaining simplified environment variable approach - Update createOpenAIClient note to recommend env var method - Add installation steps and configuration examples - Include both Chinese and English documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(site): add Langfuse guide to model-config - Update LangSmith section with installation steps - Add Langfuse integration section with complete setup guide - Include both Chinese and English documentation - Add cross-references to API documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(site): update environment variable examples - Update LangSmith configuration with actual env var names - Update Langfuse configuration with correct BASE_URL variable - Replace sample API keys with placeholder format - Add region-specific endpoint examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

@RsPress

…1490) * feat(report): improve UI display and simplify task output structure This commit improves the report UI display and simplifies the task execution output structure: **Report UI Improvements:** - Separate image display from text in task parameters with clickable links - Add element-detail-box styling for locate/element objects - Support text wrapping in sidebar titles - Highlight elements from Action Space task params in Screenshots view - Hide Output section when output is undefined **Core Changes:** - Simplify task output structure by returning actionResult directly - Change operation functions (aiTap, aiScroll, etc.) to return void instead of values - Update locateParamStr to support Action Space description field **Technical Details:** - Modified ui-utils.ts to remove image concatenation logic - Enhanced detail-side component to handle multiple data structures - Added shared renderElementDetailBox function for code reuse - Updated detail-panel to extract elements from Action Space params - Added dark mode support for element-detail-box styles 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(report): move renderElementDetailBox definition before MetaKV component Move the renderElementDetailBox function definition to before the MetaKV component to fix potential hoisting issues. Arrow functions defined with const are not hoisted, so they must be defined before being referenced. This ensures the function is available when MetaKV's renderContent calls it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): side panel style * feat(report): implement extractInsightParam function for improved task parameter handling * feat(report): add renderMetaContent and extractTaskImages functions for improved content handling * chore(deps): update @RsPress packages to version 2.0.0-rc.1 in package.json and pnpm-lock.yaml --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yutao <[email protected]>

…ls (#1491) * fix(playwright): fix this binding issue in AI fixture declarative calls Fixed a this binding issue where methods called through bracket notation (agent[aiActionType]) would lose their context, causing TypeError when accessing instance methods or properties. Added .bind(agent) to ensure proper this context is maintained when dynamically invoking agent methods like aiTap, aiQuery, aiAssert, etc. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update packages/web-integration/src/playwright/ai-fixture.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/unit-test/playwright-ai-fixture-this-binding.test.ts Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

) * feat(site): enhance feature sections with updated models and APIs - Update model list: remove Qwen2.5-VL, add separate cards for Seed, Qwen3-VL, Gemini-2.5-Pro, and UI-TARS - Add model icons: doubao-color.svg, qwen-color.svg, gemini-color.svg, bytedance-color.svg - Add more API cards: aiQuery and aiAssert with custom SVG icons - Create API icons: ai-action.svg, ai-tap.svg, ai-query.svg, ai-assert.svg, playback-report.svg - Add "View All APIs" card linking to /zh/api documentation - Update i18n: add translations for new model and API descriptions - Improve card styling: use consistent gray backgrounds, add hover effects 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(site): add accessibility attributes to arrow SVG icon - Add title element to SVG for screen readers - Add role='img' and aria-label attributes - Fixes a11y/noSvgWithoutTitle lint error * feat(site): update feature section to highlight key capabilities - Change section title from 'DEBUGGING' to 'FEATURES' - Update heading to 'Powerful Features' - Replace 3 debugging-focused items with 4 key features: 1. Rich APIs - smart automation and atomic control 2. MCP Server - device operations exposed as MCP Server 3. Reports & Playground - improved debugging tools 4. Flexible Integration - multiple formats and extensibility - Add debuggingDesc4 to i18n for the new fourth feature point * fix(site): update section title to DEVELOPER EXPERIENCE - Change debuggingTitle to 'DEVELOPER EXPERIENCE' (en) / '开发体验' (zh) - Change debuggingHeading to 'Developer APIs & Tools' (en) / '开发者 API 和工具' (zh) * refactor(site): improve feature descriptions for clarity - Update debuggingDesc1: emphasize both smart workflows and atomic control - Update debuggingDesc2: clarify MCP Server collaboration purpose - Update debuggingDesc3: highlight intuitive visualization and testing - Update debuggingDesc4: keep flexible integration description concise Both EN and ZH translations refined for better readability. * feat(site): add copyright section to the homepage * refactor(site): replace API cards with feature cards Replace the 5 API showcase cards with 4 developer experience feature cards while maintaining the "View All APIs" link card. Restore original debugging descriptions (debuggingDesc1-3). Changes: - Replace API cards with 4 feature cards: Rich APIs, MCP Server, Reports & Playground, Flexible Integration - Restore original debugging section descriptions - Keep "View All APIs" card for API documentation access - Update both English and Chinese i18n files - Maintain consistent styling and layout 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

Enhanced the token usage display in the report sidebar to show separate statistics for each AI model when multiple models are used. Changes: - Display individual model statistics when multiple models are detected - Add "Total" tag next to each model name in multi-model view - Skip tasks without usage information to avoid "Unknown" model entries - Maintain single "Total" row display for single-model scenarios This improvement provides better visibility into token consumption across different AI models, helping users analyze and optimize their AI usage costs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1495) Add `awaitPromise: true` parameter to Runtime.evaluate call in Chrome Extension's evaluateJavaScript method to properly wait for Promise resolution. This aligns the behavior with Puppeteer and Playwright, which natively wait for Promises to resolve. Without this parameter, the Chrome Extension would return unresolved Promise objects instead of their resolved values. **Change:** - Add `awaitPromise: true` to Runtime.evaluate parameters **Impact:** - Async scripts now properly wait for Promise resolution - Behavior now consistent with Puppeteer/Playwright - Backward compatible (sync scripts work as before) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* refactor(mcp): redefine mcp server for Midscene * refactor(mcp): redefine mcp server for Midscene * refactor(mcp): redefine mcp server for Midscene * refactor(mcp): redefine mcp server for Midscene * refactor(mcp): redefine mcp server for Midscene * refactor(core): split planning and locator * fix(core): test cases * chore(core): add assertion plan into action space * fix(core): ci * refactor(core): model config callback * chore(core): fix lint * chore(core): fix lint * test(shared): update env unit tests (#1494) * chore(core): change signature of model config * chore(core): fix lint * fix(core): lint * fix(core): mcp unit test * chore(core): update test cases * fix(core): ci * chore(core): update test cases * chore(core): update docs * fix(core): ci test * chore(core): update docs * chore(core): update deps * chore(core): fix lint * chore(core): fix building error * fix(core): building error * chore(core): update prompt * fix(core): building error

Copilot AI and others added 30 commits October 17, 2025 17:12

chore(core): remove warning msg for gpt-4 (#1331)

2a98471

* chore(core): remove warning msg for gpt-4 * chore(core): remove dom-based locator

feat(core): update recorder (#1330)

23c49d3

* chore(core): refine recorder loop * feat(core): update implementation of recorder

chore(core): update tasks impementation (#1338)

c9b385b

* chore(core): update implementation of insight * chore(core): refine error plan * chore(core): refine error plan * chore(core): split tasks into multiple parts * fix(core): fix ci

refine(core): use 'subTask' flag to reuse context (#1350)

dc60bc3

* chore(core): update types of task executor * chore(core): update sleep tasks * chore(core): update types for planning * feat(core): update subTask flag

refactor(core): remove tree in context (#1376)

57cd24a

* refactor(core): remove tree info in uiContext * chore(core): fix lint * chore(core): remove dom-based locator * fix(core): test cases * chore(core): fix lint * fix(core): test cases

yuyutaotao and others added 29 commits November 18, 2025 09:53

docs(core): update api docs (#1468)

06c3ce7

* docs(core): update api reference doc * docs(core): fix all dead links * docs(core): fix all dead links

chore(core): add the missing web API doc

45b9af6

fix(core): fix dead link to doc

543c505

docs(core): fix all dead links

bb4e120

chore(core): fix lint

cb73bcb

chore(core): fix model intent (#1474)

2e4e09e

chore(core): throw error when failed to locate in qwen (#1478)

8aab4e7

feat(core): add image history into conversation (#1496)

143837c

docs(core): update homepage

dd5de4d

quanru closed this Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: 1.0 #1421

WIP: 1.0 #1421

Uh oh!

yuyutaotao commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

WIP: 1.0 #1421

WIP: 1.0 #1421

Uh oh!

Conversation

yuyutaotao commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants