Skip to content

Conversation

@yuyutaotao
Copy link
Collaborator

No description provided.

Copilot AI and others added 30 commits October 17, 2025 17:12
* Initial plan

* fix(cli): allow duplicate YAML files in config.yaml

Co-authored-by: quanru <[email protected]>

* fix(cli): deep clone YAML script to prevent mutation issues

* fix(yaml): prevent mutation of flowItem by creating a new object for processing

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: quanru <[email protected]>
Co-authored-by: quanruzhuoxiu <[email protected]>
….x (#1325)

* refactor(core): remove non-OpenAI SDK support and upgrade to OpenAI 6.x

This commit removes support for Anthropic SDK and Azure OpenAI, simplifying
the codebase to use only the standard OpenAI SDK with OpenAI-style APIs.

Changes:
- Remove Anthropic SDK (@anthropic-ai/sdk) dependency
- Remove Azure OpenAI specific code and @azure/identity dependency
- Remove langsmith wrapper support
- Remove proxy agent support (https-proxy-agent, socks-proxy-agent)
- Upgrade OpenAI SDK from 4.81.0 to 6.3.0
- Simplify createChatClient function to only create standard OpenAI clients
- Remove 'style' parameter from createChatClient return type
- Remove all Anthropic-specific message handling code
- Add openai 6.3.0 as devDependency to @midscene/shared

Benefits:
- Cleaner, more maintainable codebase
- Reduced dependencies (removed 5 packages)
- All AI providers can now be accessed through OpenAI-compatible APIs

Breaking Changes:
- Anthropic SDK mode no longer supported
- Azure OpenAI specific configuration removed
- MIDSCENE_LANGSMITH_DEBUG no longer supported
- httpAgent/socksProxy removed from createChatClient

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor(core): model provider documentation and remove Azure and Anthropic configurations

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>

* feat(core): add proxy support for OpenAI client with HTTP and SOCKS configurations

* feat(core): add qwen-vl specific configuration for high resolution images

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: yuyutaotao <[email protected]>
Co-authored-by: Copilot <[email protected]>
This change ensures that Planning functionality only supports vision
language models (VL mode) and removes DOM-based planning support.

Changes:
- Add validation in ModelConfigManager.getModelConfig() to require
  VL mode for Planning intent
- Remove DOM mode logic from llm-planning.ts (describeUserPage,
  markupImageForLLM)
- Simplify image processing to only support VL mode paths
- Add comprehensive JSDoc documentation for Planning VL mode
  requirement
- Add 6 new unit tests covering Planning VL mode validation in both
  isolated and normal modes
- Fix existing tests to provide VL mode for Planning intent

Breaking Change:
- Planning without VL mode configured will now throw an error with
  clear instructions
- Error message includes all supported VL modes and configuration
  examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
* chore(core): remove warning msg for gpt-4

* chore(core): remove dom-based locator
* chore(core): refine recorder loop

* feat(core): update implementation of recorder
* refactor(core,web-integration,docs): rename API methods for clarity

BREAKING CHANGE: Renamed aiAction() to aiAct() and logScreenshot()
to recordToReport() for improved naming consistency. The aiAction()
method is kept as deprecated for backward compatibility.

Changes:
- Renamed aiAction() to aiAct() across core and web-integration
- Renamed logScreenshot() to recordToReport()
- Updated all English and Chinese documentation
- Updated code examples in README files
- Updated Playwright fixture to support new method names
- Added deprecation warning for aiAction() method
- Updated all test files and examples

This improves API consistency and clarity while maintaining
backward compatibility through deprecated methods.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat(yaml): add backward compatibility for aiAction method in YAML flow

* fix(core): conditionally add httpAgent to OpenAI client options

Fix TypeScript compilation error where httpAgent property doesn't
exist in OpenAI 6.x ClientOptions type. Only include httpAgent
when a proxy is configured, and use type assertion to bypass the
strict type check.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
* chore(core): update implementation of insight

* chore(core): refine error plan

* chore(core): refine error plan

* chore(core): split tasks into multiple parts

* fix(core): fix ci
* chore(release): upgrade all packages to v1.0.0

- Bump version from 0.30.4 to 1.0.0 for all packages
- Update Chrome extension manifest version to 0.136
- Update internal package dependencies to 1.0.0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat(release): add validation to prevent 1.x stable releases

- Block publishing of 1.x versions with 'latest' tag
- Allow publishing 1.x beta versions (prepatch)
- Allow publishing stable versions for other major versions (0.x, 2.x, etc.)

This ensures that 1.x releases can only be published as beta versions,
preventing accidental stable releases while still allowing testing and
pre-release distributions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
* refactor(core): remove unused getXpathsById method

This method was not being used in the codebase. Removed:
- Core implementation in shared/src/extractor/locator.ts
- Export from shared/src/extractor/index.ts
- Implementations in puppeteer/base-page.ts, chrome-extension/page.ts, and static/static-page.ts
- All related unit tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor(types): rename AndroidPullParam and AndroidLongPressParam to PullParam and LongPressParam

---------

Co-authored-by: Claude <[email protected]>
…1341)

* feat(core): support custom OpenAI client instances for observability

Enable users to provide custom OpenAI client factory function through
AgentOpt.createOpenAIClient, allowing integration with observability
tools like langsmith and langfuse.

Key changes:
- Add CreateOpenAIClientFn type in @midscene/shared/env for creating
  custom OpenAI clients
- Extend AgentOpt interface with optional createOpenAIClient callback
- Pass callback through Agent -> ModelConfigManager -> IModelConfig
- Inject createOpenAIClient during config initialization for better
  performance
- Update createChatClient to use custom client factory when provided

Benefits:
- Users can wrap OpenAI clients with langsmith's wrapOpenAI() for
  tracing
- Users can wrap with langfuse's observeOpenAI() for logging
- Support different clients for different intents (planning, grounding,
  VQA, default)
- Zero runtime overhead - injection happens during config initialization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* test(core): add unit tests for custom OpenAI client integration in ModelConfigManager and service-caller

* Update packages/shared/tests/unit-test/env/modle-config-manager.test.ts

Co-authored-by: Copilot <[email protected]>

* refactor(core): remove unused MIDSCENE_API_TYPE constant from service-caller and types

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: Copilot <[email protected]>
* chore(ci): enable workflows for PRs targeting 1.0 branch

Add 1.0 branch to pull_request triggers in CI and lint workflows to ensure
PRs targeting the 1.0 branch run the same checks as PRs to main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* tests(shared, web-integration): update tests to use runner instead of executor and improve environment setup

---------

Co-authored-by: Claude <[email protected]>
* docs(awesome): add midscene java sdk (#1324)

* fix(core): support number type for aiInput value field (#1339)

* fix(core): support number type for aiInput value field

This change allows aiInput.value to accept both string and number types,
addressing scenarios where:
1. AI models return numeric values instead of strings
2. YAML files contain unquoted numbers that parse as number type

Changes:
- Updated type definitions to accept string | number
- Added Zod schema transformation to convert numbers to strings
- Updated runtime validation to accept both types
- Added explicit conversion in YAML player as fallback

All conversions happen internally and are transparent to users.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(core): update aiInput type signatures to accept number values

Update the TypeScript method signatures for aiInput to accept
string | number for the value parameter, matching the runtime
implementation.

Changes:
- New signature: opt parameter now accepts { value: string | number }
- Legacy signature: first parameter now accepts string | number
- Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number
- Type assertion updated from `as string` to `as string | number`

This ensures type safety and allows users to pass number values
directly without TypeScript errors, while maintaining backward
compatibility with existing string-based usage.

Fixes type errors in test cases that use number values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>

* fix(report): prevent sidebar jitter when expanding case selector (#1344)

Fixed sidebar shifting 1-2 pixels when clicking to expand the
playwright case selector. The issue was caused by adding a border
only in the expanded state, causing a sudden height change.

Solution: Added transparent border to the collapsed state, ensuring
consistent height across both states.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>

* refactor(core): unify cache config parameters (#1346)

Simplified `processCacheConfig` function signature from 3 to 2 parameters.
Unified `fallbackId` and `cacheId` into single `cacheId` parameter.

BREAKING CHANGE: processCacheConfig signature changed

Changed from:
  processCacheConfig(cache, fallbackId, cacheId?)
To:
  processCacheConfig(cache, cacheId)

The cacheId parameter now serves dual purpose:
1. Fallback ID when cache is true or cache object lacks ID
2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env)

Updated call sites:
- packages/core/src/agent/agent.ts
- packages/web-integration/src/playwright/ai-fixture.ts
- packages/cli/src/create-yaml-player.ts (4 locations)

Added comprehensive test coverage for legacy compatibility mode:
- process-cache-config.test.ts: 18 tests passing
- create-yaml-player.test.ts: 13 tests passing (6 new)
- playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new)

Benefits:
- Simpler API with fewer parameters
- Unified semantics for new and legacy use cases
- Full backward compatibility maintained
- Better test coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>

* fix(core,web-integration): fix unit tests after merging main branch

This commit fixes unit test failures that occurred after merging the
main branch into the 1.0 branch. The issues were caused by temporal
conflicts between commits that added new features and subsequent
refactoring.

Root Cause:
- Commit 13b4f1d added aiInput number support with tests using
  'executor'
- Commit c9b385b refactored Executor → TaskRunner in the 1.0 branch
- When main was merged, tests still referenced 'executor' but code
  used 'runner'

Changes:
1. Fix YAML player aiInput number conversion
   (packages/core/src/yaml/player.ts):
   - Extract 'value' field separately to prevent spread override
   - Ensure number values are converted to strings via String(value)
   - Maintain backward compatibility for empty string handling

2. Fix test mock structure
   (packages/web-integration/tests/unit-test/ai-input-number-value.test.ts):
   - Update all mock objects from 'executor' to 'runner'
   - Aligns with TaskRunner API refactoring

3. Fix cache config test
   (packages/web-integration/tests/unit-test/playwright-ai-fixture-cache.test.ts):
   - Move vi.mock() before imports to ensure proper module hoisting
   - Fixes legacy mode environment variable checks

4. Add value conversion in agent.ts (optional improvement):
   - Explicitly convert number to string in aiInput method
   - Improves code clarity and test stability

All tests now pass (195 passed, 1 skipped).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: yuyutaotao <[email protected]>
Co-authored-by: Claude <[email protected]>
* chore(core): update types of task executor

* chore(core): update sleep tasks

* chore(core): update types for planning

* feat(core): update subTask flag
* chore(lint): fix linting and formatting issues

- Fix useless switch case in modle-config-manager.test.ts
- Format package.json files for consistency
- Apply code formatting across core, agent, and related files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* chore(deps): update openai package to version 6.3.0

---------

Co-authored-by: Claude <[email protected]>
* feat(chrome-extension): enable hot reload for development

This commit adds hot reload support for chrome-extension development,
significantly improving the development experience.

Main changes:
- Add web-ext integration for automatic extension reloading
- Add wait-for-build.js script to ensure build completes first
- Update dev script to use concurrently for build watch + web-ext
- Add web-ext-config.cjs for web-ext configuration

To fix build stability during hot reload:
- Replace npm-watch with rslib native watch mode in visualizer
- Standardize dev/build:watch script relationship across packages
- This prevents dist directory deletion during rebuilds

The rslib native watch mode performs incremental builds without
deleting the dist directory, preventing "Module not found" errors
when chrome-extension references @midscene/visualizer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(chrome-extension): wait for JS bundles before starting web-ext

The previous implementation only checked for static files (manifest.json,
index.html) which are copied early in the build process. This caused web-ext
to start before the JavaScript bundles were built, resulting in errors.

Now we check for the actual build outputs:
- dist/static/js/index.js
- dist/static/js/popup.js

This ensures web-ext only starts after Rsbuild has completed the full build.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* chore(deps): align Rsbuild plugin versions across workspace

Update all Rsbuild plugins to use consistent versions:
- @rsbuild/plugin-less: 1.5.0
- @rsbuild/plugin-node-polyfill: 1.4.2
- @rsbuild/plugin-react: 1.4.1
- @rsbuild/plugin-svgr: 1.2.2
- @rsbuild/plugin-type-check: 1.2.4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
* docs(awesome): add midscene java sdk (#1324)

* fix(core): support number type for aiInput value field (#1339)

* fix(core): support number type for aiInput value field

This change allows aiInput.value to accept both string and number types,
addressing scenarios where:
1. AI models return numeric values instead of strings
2. YAML files contain unquoted numbers that parse as number type

Changes:
- Updated type definitions to accept string | number
- Added Zod schema transformation to convert numbers to strings
- Updated runtime validation to accept both types
- Added explicit conversion in YAML player as fallback

All conversions happen internally and are transparent to users.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(core): update aiInput type signatures to accept number values

Update the TypeScript method signatures for aiInput to accept
string | number for the value parameter, matching the runtime
implementation.

Changes:
- New signature: opt parameter now accepts { value: string | number }
- Legacy signature: first parameter now accepts string | number
- Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number
- Type assertion updated from `as string` to `as string | number`

This ensures type safety and allows users to pass number values
directly without TypeScript errors, while maintaining backward
compatibility with existing string-based usage.

Fixes type errors in test cases that use number values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>

* fix(report): prevent sidebar jitter when expanding case selector (#1344)

Fixed sidebar shifting 1-2 pixels when clicking to expand the
playwright case selector. The issue was caused by adding a border
only in the expanded state, causing a sudden height change.

Solution: Added transparent border to the collapsed state, ensuring
consistent height across both states.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>

* refactor(core): unify cache config parameters (#1346)

Simplified `processCacheConfig` function signature from 3 to 2 parameters.
Unified `fallbackId` and `cacheId` into single `cacheId` parameter.

BREAKING CHANGE: processCacheConfig signature changed

Changed from:
  processCacheConfig(cache, fallbackId, cacheId?)
To:
  processCacheConfig(cache, cacheId)

The cacheId parameter now serves dual purpose:
1. Fallback ID when cache is true or cache object lacks ID
2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env)

Updated call sites:
- packages/core/src/agent/agent.ts
- packages/web-integration/src/playwright/ai-fixture.ts
- packages/cli/src/create-yaml-player.ts (4 locations)

Added comprehensive test coverage for legacy compatibility mode:
- process-cache-config.test.ts: 18 tests passing
- create-yaml-player.test.ts: 13 tests passing (6 new)
- playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new)

Benefits:
- Simpler API with fewer parameters
- Unified semantics for new and legacy use cases
- Full backward compatibility maintained
- Better test coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>

* release: v0.30.5

* docs(site): optimize v0.30 changelog with user-focused improvements (#1352)

Improved the v0.30 changelog to be more user-centric and less promotional:

- Reduced hyperbolic language ("comprehensive upgrade" → "improved", etc.)
- Reorganized content structure with clearer user value sections
- Added specific usage scenarios and examples for cache strategies
- Enhanced mobile platform sections with iOS and Android subsections
- Simplified technical descriptions to be more objective
- Added cross-platform consistency section for ClearInput feature
- Translated optimized content to English version

These changes make the changelog more professional and easier for users
to understand the actual benefits of the update.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>

* fix(ios): correct horizontal scroll direction and improve swipe implementation (#1358)

* fix(ios): correct horizontal scroll direction and improve swipe implementation

Fixed two issues with iOS horizontal scrolling:

1. **Corrected scroll direction semantics**
   - scrollLeft now swipes right (brings left content into view)
   - scrollRight now swipes left (brings right content into view)
   - This aligns with Android and Web scroll behavior where the
     direction indicates which content enters the viewport

2. **Improved swipe implementation**
   - Implemented W3C Actions API for better scroll support
   - Falls back to dragfromtoforduration if Actions API fails
   - Increased scroll distance from width/3 to width*0.7 (70%)
     to prevent bounce-back

3. **Fixed scrollUntilBoundary directions**
   - Corrected left/right swipe directions in boundary detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor(ios): remove fallback from swipe method, use W3C Actions API only

---------

Co-authored-by: Claude <[email protected]>

* feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice (#1363)

* fix(docs): add alwaysFetchScreenInfo parameter to AndroidDevice constructor documentation

* feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice

Configure AndroidDevice instance with alwaysFetchScreenInfo option
set to true to ensure screen information is always fetched during
device operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(android): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for consistency

---------

Co-authored-by: Claude <[email protected]>

* fix(core): handle ZodEffects and ZodUnion in schema parsing (#1359)

* fix(core): handle ZodEffects and ZodUnion in schema parsing

- Add support for ZodEffects (transformations) in getTypeName and getDescription
- Add support for ZodUnion types with proper type display (type1 | type2)
- Fixes "failed to parse Zod type" warning on first execution with caching

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* test(core): add tests for descriptionForAction with ZodEffects and ZodUnion

* chore(core): update test cases

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: yutao <[email protected]>

* feat(playground): implement task cancellation for Android/iOS playgrounds (#1355)

* feat(playground): implement task cancellation for Android/iOS playgrounds

This PR implements task cancellation functionality for Android and iOS
playgrounds using a singleton + recreation pattern.

When users clicked the "Stop" button in Android/iOS playground, the task
continued to execute and control the device via ADB commands. This was
because:
- Agent instances were global singletons created at server startup
- The /cancel endpoint only deleted progress tips without stopping execution
- There was no mechanism to interrupt ongoing tasks

Implemented a singleton + recreation pattern:
- PlaygroundServer now accepts factory functions instead of instances
- Added task locking mechanism (currentTaskId) to prevent concurrent tasks
- When cancel is triggered, the agent is destroyed and recreated
- Device operations stop immediately as destroyed agents reject new commands

1. **PlaygroundServer** (packages/playground/src/server.ts)
   - Added factory function support for page and agent creation
   - Added `recreateAgent()` method to destroy and recreate agent
   - Added `currentTaskId` to track running tasks
   - Enhanced `/execute` endpoint with task conflict detection
   - Enhanced `/cancel` endpoint to recreate agent on cancellation
   - Backward compatible with existing instance-based usage

2. **Android Playground** (packages/android-playground/src/bin.ts)
   - Updated to use factory pattern for server creation
   - Each recreation creates fresh AndroidDevice and AndroidAgent instances

3. **iOS Playground** (packages/ios/src/bin.ts)
   - Updated to use factory pattern for server creation
   - Each recreation creates fresh IOSDevice and IOSAgent instances

- Added test script `test-cancel-android.sh` for automated testing
- Manual testing confirmed device operations stop when cancel is triggered

```
User clicks Stop
  ↓
Frontend calls /cancel/:requestId
  ↓
Server checks if current running task
  ↓
Call recreateAgent()
  ├─ Destroy old agent (agent.destroy())
  ├─ Destroy old device (device.destroy())
  ├─ Create new device (pageFactory())
  └─ Create new agent (agentFactory(device))
  ↓
Clear task lock and progress tips
  ↓
Device stops operations ✅
```

- ✅ Simple implementation (minimal code changes)
- ✅ Effective cancellation (destroy() immediately sets destroyed flag)
- ✅ Backward compatible (still accepts instances)
- ✅ Natural serialization (one task at a time per device)

```bash
pnpm run android:playground

./test-cancel-android.sh
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(page): ensure keyboard actions return promises for better async handling

* refactor(playground): update PlaygroundServer to use agent factories and simplify server creation

* fix(ios): round coordinates for tap and swipe actions to improve accuracy

* fix(android): round coordinates in scrolling and gesture methods for improved accuracy

* refactor(playground): simplify PlaygroundServer instantiation and improve code readability

---------

Co-authored-by: Claude <[email protected]>

* fix(yaml): skip environment variable interpolation in YAML comments (#1361)

* Initial plan

* fix(yaml): skip environment variable interpolation in YAML comments

* style(yaml): apply biome linting fixes

Co-authored-by: quanru <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: quanru <[email protected]>

* fix(core): handle null data in WaitFor and support array keyName in KeyboardPress (#1354)

* fix(core): handle null data in WaitFor and support array keyName in KeyboardPress

This commit fixes two critical bugs:

1. **Fix null data handling in task execution**
   - Fixed TypeError when AI extract() returns null for WaitFor operations
   - Added null/undefined check before accessing data properties
   - WaitFor operations now return false when data is null (condition not met)
   - Other operations (Assert, Query, String, Number) return null when data is null
   - Location: src/agent/tasks.ts:936-938

2. **Add array support for keyName in KeyboardPress**
   - Updated actionKeyboardPressParamSchema to accept string | string[]
   - Allows key combinations like ['Control', 'A'] for keyboard shortcuts
   - Maintains backward compatibility with string format
   - Updated type definitions in aiKeyboardPress method
   - Locations:
     - src/device/index.ts:197-199
     - src/agent/agent.ts:575-622

**Test Coverage:**
- Added comprehensive unit tests for null data handling (8 test cases)
- Added unit tests for keyName array validation (7 test cases)
- All tests verify edge cases and expected behavior

Fixes issue where executor crashed with:
"TypeError: Cannot read properties of null (reading 'StatementIsTruthy')"

And fixes parameter validation error:
"Invalid parameters for action KeyboardPress: Expected string, received array"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(ios,android): handle array keyName in KeyboardPress action

- Updated iOS and Android device implementations to handle keyName as string | string[]
- For mobile devices, array keys are joined with '+' (e.g., ['Control', 'A'] becomes 'Control+A')
- This fixes TypeScript compilation errors in iOS and Android packages
- Maintains backward compatibility with string format

Related to the KeyboardPress array support added in the previous commit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor(ios,android): improve KeyboardPress array handling

- Remove incorrect join('+') approach that doesn't work on mobile devices
- Use last key from array instead (e.g., ['Control', 'A'] → 'A')
- Add clear warning messages when array input is used on mobile platforms
- Mobile devices don't support keyboard combinations, this is a graceful degradation

This makes the behavior more predictable and provides better feedback to developers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* test(core): fix TaskExecutor constructor arguments in null data tests

- Fixed TaskExecutor constructor call to match actual signature
- Constructor requires (interface, insight, options) instead of (insight, interface)
- All 8 tests now passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(ios,android): improve logging for unsupported key combinations in device input

* fix(core): handle null data in WaitFor and improve keyName parameter description

This commit fixes the null data handling bug and improves the KeyboardPress parameter description.

## Changes:

### 1. Fix null data handling in task execution
- Fixed TypeError when AI extract() returns null for WaitFor operations
- Added null/undefined check before accessing data properties (tasks.ts:936-938)
- WaitFor operations now return false when data is null (condition not met)
- Other operations (Assert, Query, String, Number) return null when data is null

### 2. Improve KeyboardPress parameter description
- Reverted keyName to only accept string type (not array)
- Added clear description: "Use '+' for key combinations, e.g., 'Control+A', 'Shift+Enter'"
- This provides better guidance to AI for generating key combinations
- Simplified iOS/Android implementations (no special array handling needed)

### 3. Test coverage
- Added 8 unit tests for null data handling
- Updated KeyboardPress tests to validate string-only format
- Added test for key combination strings (e.g., 'Control+A')
- Added test to verify arrays are rejected
- Fixed unused variable warning in test file

## Fixed Issues:

**Issue 1:** Executor crashes with null data
```
TypeError: Cannot read properties of null (reading 'StatementIsTruthy')
```

**Issue 2:** Unclear how to specify key combinations
- Now clearly documented in parameter description with examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* docs(core): align KeyboardPress action description with parameter schema

Updated the KeyboardPress action description to explicitly mention
support for key combinations (e.g., "Control+A", "Shift+Enter"),
making it consistent with the keyName parameter description that
already documented this functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(core): handle null and undefined data in WaitFor output processing

---------

Co-authored-by: Claude <[email protected]>

* perf(android): optimize clearInput performance by batching keyevents (#1366)

* perf(android): optimize clearInput performance by batching keyevents

Replace serial keyevent(67) calls with clearTextField() method from
appium-adb library, which batches all keyevents into a single shell command.

Performance improvement:
- Before: ~50 seconds (100 sequential shell calls, ~500ms each)
- After: ~1-2 seconds (single batched shell command)
- Speedup: 25-50x

Changes:
- Use adb.clearTextField(100) instead of repeat(() => adb.keyevent(67))
- Add clearTextField mock to unit tests for compatibility

All 75 unit tests passing, build successful.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(android): include device pixel ratio in size calculation for AndroidDevice

---------

Co-authored-by: Claude <[email protected]>

* release: v0.30.6

* fix(tests): enhance null data handling tests by adding uiContext parameter

---------

Co-authored-by: yuyutaotao <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: yutao <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: quanru <[email protected]>
…ication (#1365)

* feat(bridge-mode): add remote access support for cross-machine communication

This commit implements remote access capability for Bridge Mode,
enabling communication between server and client on different machines.

## Changes

### Core Features
- Server side: Added `allowRemoteAccess` option to bind server to 0.0.0.0
- Server side: Added `host` and `port` options for custom configuration
- Client side: Added server URL configuration UI in Chrome extension
- Configuration priority: host > allowRemoteAccess > default (127.0.0.1)

### Modified Files
- packages/web-integration/src/bridge-mode/:
  - common.ts: Added getBridgeServerHost() helper function
  - io-server.ts: Modified to support custom host binding
  - agent-cli-side.ts: Added remote access options to constructor
  - page-browser-side.ts: Added server endpoint parameter support

- apps/chrome-extension/src/:
  - extension/bridge/index.tsx: Added server URL configuration UI
  - extension/bridge/index.less: Added styles for configuration section
  - utils/bridgeConnector.ts: Support custom server endpoint

- packages/web-integration/tests/:
  - ai/bridge/remote-access.test.ts: Added comprehensive tests
  - unit-test/bridge/io.test.ts: Updated tests for new API

### Documentation
- Updated docs in apps/site/docs/{en,zh}/bridge-mode-by-chrome-extension.mdx
- Added remote access configuration section with examples
- Added security warnings for remote access usage

## API Changes

New constructor options:
- allowRemoteAccess: Enable remote access
- host: Custom host (optional)
- port: Custom port (optional)

## Backward Compatibility
- All existing code works without modification
- Default behavior unchanged (localhost only)
- All unit tests passing

## Security
- Default remains secure (127.0.0.1 only)
- Remote access requires explicit opt-in
- Documentation includes security warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(bridge): resolve race condition in server initialization

Fix the 'xhr poll error' by ensuring all Socket.IO middleware and event
handlers are set up BEFORE calling httpServer.listen(). This eliminates
the race condition where clients could attempt to connect before the
server was fully ready.

Changes:
- Moved Socket.IO middleware setup before httpServer.listen()
- Moved Socket.IO connection handlers before httpServer.listen()
- Moved httpServer.listen() to the end of initialization sequence

Fixes failing unit tests in packages/web-integration/tests/unit-test/
bridge/io.test.ts (all 15 tests now passing)

* fix(web-integration): add delay to ensure Socket.IO is fully ready in server initialization

* fix(bridge-server): improve HTTP server setup and event handling order

* fix(bridge): improve server URL handling and localStorage management

* feat(bridge): enhance server configuration UI with expandable section and improved styling

* Update packages/web-integration/tests/ai/bridge/remote-access.test.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/web-integration/tests/ai/bridge/remote-access.test.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/web-integration/tests/ai/bridge/remote-access.test.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/web-integration/tests/ai/bridge/remote-access.test.ts

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: Copilot <[email protected]>
#1377)

## Problem
The previous nano-staged configuration had two issues:
1. Used `biome check .` which checked the entire project instead of only staged files
2. nano-staged doesn't automatically re-stage fixed files, causing commits to fail

## Solution
Switched to lint-staged which:
- Automatically passes only staged files to biome
- Re-stages files after fixes are applied
- More mature and widely adopted

## Changes
- Replaced nano-staged with lint-staged in pre-commit hook
- Updated biome command to remove project-wide checks
- Added lint-staged as dev dependency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
* feat(yaml): support all device options in YAML configuration

This PR enables YAML scripts to use all Android and iOS device options
by centralizing device option types and ensuring runtime configuration
propagation.

Changes:
- Created packages/core/src/device/device-options.ts to centralize all
  device option type definitions (AndroidDeviceOpt, IOSDeviceOpt)
- Updated MidsceneYamlScriptAndroidEnv and MidsceneYamlScriptIOSEnv to
  extend device options using Omit<> to exclude programmatic fields
- Fixed runtime configuration passing in create-yaml-player.ts to
  forward all YAML config options to device constructors
- Simplified agent creation functions to pass entire options object
  instead of manually listing each parameter

YAML scripts can now configure:

Android:
- androidAdbPath, remoteAdbHost, remoteAdbPort
- imeStrategy, displayId, usePhysicalDisplayIdForScreenshot
- screenshotResizeScale, alwaysFetchScreenInfo
- autoDismissKeyboard, keyboardDismissStrategy

iOS:
- deviceId, useWDA, wdaPort, wdaHost
- autoDismissKeyboard

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* test(yaml): add unit tests for device options propagation

Add comprehensive unit tests to verify that all device options are
correctly passed from YAML configuration to device constructors.

Tests include:
- Android device options propagation from YAML to agentFromAdbDevice
- iOS device options propagation from YAML to agentFromWebDriverAgent
- Type definitions for AndroidDeviceOpt and IOSDeviceOpt
- YAML environment types (MidsceneYamlScriptAndroidEnv, MidsceneYamlScriptIOSEnv)
- Validation that customActions is excluded from YAML types
- IME strategy and keyboard dismiss strategy type validations
- Minimal and full configuration scenarios

All 31 tests passing (17 in CLI, 14 in Core).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(android): ensure empty object is passed when opts is undefined

Fix failing unit tests by ensuring an empty object is passed to
AndroidDevice and IOSDevice constructors when opts is undefined,
maintaining backward compatibility with existing tests.

Changes:
- Updated agentFromAdbDevice to pass opts || {} to AndroidDevice
- Updated agentFromWebDriverAgent to pass opts || {} to IOSDevice

This ensures the constructors always receive an object instead of
undefined, which is what the existing tests expect.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(device-options): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for clarity

* docs(site): update Android and iOS sections to include all configuration options from their respective constructors

---------

Co-authored-by: Claude <[email protected]>
Update the task type display names in report sidebar and detail views:
- Change "Insight / Query" and "Insight / Assert" to "Insight"
- Change "Action / {subType}" to "Action Space / {subType}"
- Show "Planning / Plan" instead of just "Planning"
- Keep other task types unchanged (e.g., "Planning / Locate")

This provides clearer and more consistent naming for different task
types in the report UI, making it easier to understand the task
hierarchy and categorization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
…1381)

This change improves code consistency by using clonedYamlScript.agent
instead of mixing yamlScript.agent and clonedYamlScript for other
properties throughout the agent initialization code.

Changes:
- Use clonedYamlScript.agent consistently across all agent types
  (puppeteer, bridge mode, Android, iOS, and interface)
- This ensures all configuration comes from the same cloned instance,
  preventing potential mutation issues when the same YAML file is
  executed multiple times
- Added comprehensive unit tests to verify aiActionContext is properly
  passed to Android, iOS, and bridge mode agents

This is a code quality improvement that makes the codebase more
maintainable and aligns with the original design intent of using
structuredClone to isolate each ScriptPlayer instance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
…1375)

* refactor(env): modernize model configuration environment variables

This PR refactors the model configuration system with improved naming conventions
and better type safety while maintaining backward compatibility.

Key Changes:

1. Environment Variable Naming Convention Updates:
   - Renamed OPENAI_* → MODEL_* for public API variables
     * OPENAI_API_KEY → MODEL_API_KEY (deprecated, backward compatible)
     * OPENAI_BASE_URL → MODEL_BASE_URL (deprecated, backward compatible)
   - Renamed MIDSCENE_*_VL_MODE → MIDSCENE_*_LOCATOR_MODE across all intents
     * MIDSCENE_VL_MODE → MIDSCENE_LOCATOR_MODE
     * MIDSCENE_VQA_VL_MODE → MIDSCENE_VQA_LOCATOR_MODE
     * MIDSCENE_PLANNING_VL_MODE → MIDSCENE_PLANNING_LOCATOR_MODE
     * MIDSCENE_GROUNDING_VL_MODE → MIDSCENE_GROUNDING_LOCATOR_MODE
   - Updated all internal MIDSCENE_*_OPENAI_* → MIDSCENE_*_MODEL_*
     * MIDSCENE_VQA_OPENAI_API_KEY → MIDSCENE_VQA_MODEL_API_KEY
     * MIDSCENE_PLANNING_OPENAI_API_KEY → MIDSCENE_PLANNING_MODEL_API_KEY
     * MIDSCENE_GROUNDING_OPENAI_API_KEY → MIDSCENE_GROUNDING_MODEL_API_KEY
     * (and corresponding BASE_URL variables)

2. Type System Improvements:
   - Split TModelConfigFn into public and internal types
   - Public API (TModelConfigFn) no longer exposes 'intent' parameter
   - Internal type (TModelConfigFnInternal) maintains intent parameter
   - Users can still optionally use intent parameter via type casting

3. Backward Compatibility:
   - Maintained compatibility for documented public variables (OPENAI_API_KEY, OPENAI_BASE_URL)
   - New variables take precedence, fallback to legacy names if not set
   - Only public documented variables are deprecated, internal variables renamed directly

4. Updated Files:
   - packages/shared/src/env/types.ts - Type definitions and constants
   - packages/shared/src/env/constants.ts - Config key mappings
   - packages/shared/src/env/decide-model-config.ts - Compatibility logic
   - packages/shared/src/env/model-config-manager.ts - Type casting implementation
   - packages/shared/src/env/init-debug.ts - Debug variable updates
   - All test files updated to use new variable names

Testing:
- All 24 model-config-manager tests passing
- Overall test suite: 241 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Update packages/shared/src/env/constants.ts

Co-authored-by: Copilot <[email protected]>

* test(env): add comprehensive backward compatibility tests for OPENAI_* variables

- Added test suite to verify MODEL_API_KEY/MODEL_BASE_URL take precedence
- Added test to ensure OPENAI_API_KEY/OPENAI_BASE_URL still work as fallback
- Fixed compatibility logic to prioritize new variables over legacy ones
- All 13 tests passing, including 5 new backward compatibility tests

Test coverage:
✓ Using only legacy variables (OPENAI_API_KEY)
✓ Using only new variables (MODEL_API_KEY)
✓ Mixing new and legacy variables (new takes precedence)
✓ Individual precedence for API_KEY and BASE_URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(test): reset MIDSCENE_CACHE in beforeEach to avoid .env interference

The test 'should return the correct value from override' was failing because
.env file sets MIDSCENE_CACHE=1. This was polluting the test environment and
causing the test to expect false but receive true.

Fixed by explicitly resetting MIDSCENE_CACHE to empty string in beforeEach.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* docs(site): update environment variable names and add advanced configuration examples for agents

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: Copilot <[email protected]>
* refactor(core): remove tree info in uiContext

* chore(core): fix lint

* chore(core): remove dom-based locator

* fix(core): test cases

* chore(core): fix lint

* fix(core): test cases
* feat(core): update signature of warp-openai

* docs(site): update createOpenAIClient API documentation

Update the documentation for createOpenAIClient to reflect the new signature:
- Changed from factory function to wrapper function
- Now receives base OpenAI instance and options
- Returns Promise<OpenAI | undefined>
- Updated examples to show async wrapper pattern
- Removed unnecessary OpenAI import from examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: quanruzhuoxiu <[email protected]>
Co-authored-by: Claude <[email protected]>
…nt variables (#1388)

Add backward compatibility support for legacy MIDSCENE_OPENAI_* environment variables:
- MIDSCENE_OPENAI_INIT_CONFIG_JSON (now MIDSCENE_MODEL_INIT_CONFIG_JSON)
- MIDSCENE_OPENAI_HTTP_PROXY (now MIDSCENE_MODEL_HTTP_PROXY)
- MIDSCENE_OPENAI_SOCKS_PROXY (now MIDSCENE_MODEL_SOCKS_PROXY)

Changes:
- Add deprecated constants to types.ts with @deprecated tags
- Add legacy variables to MODEL_ENV_KEYS for overrideAIConfig support
- Update DEFAULT_MODEL_CONFIG_KEYS_LEGACY to use legacy variable names
- Implement priority fallback logic in decide-model-config.ts (new variables take precedence)
- Update documentation (zh/en model-provider.mdx) with deprecation notices

All 139 tests pass, confirming backward compatibility works correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
)

* feat(android): add screenshot polling fallback for remote devices

Implement automatic fallback to screenshot polling mode when connecting to remote Android devices (IP:Port format), since scrcpy cannot connect to remote adb devices.

Changes:
- Refactor ScreenshotViewer to shared component in @midscene/visualizer with function-based props
- Add /api/screenshot endpoint in ScrcpyServer using adb screencap
- Add device type detection to distinguish local vs remote devices
- Conditionally render ScrcpyPlayer (real-time) for local devices or ScreenshotViewer (polling) for remote devices
- Update playground app to use new shared ScreenshotViewer component

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(visualizer): import ExecutionTaskInsightLocate from types module

Fix TypeScript build error by importing ExecutionTaskInsightLocate directly from @midscene/core/types instead of the main export.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(visualizer): define local ExecutionTaskInsightLocate interface

Define ExecutionTaskInsightLocate as a local interface instead of importing from @midscene/core to resolve TypeScript build errors. This type is not properly exported from the core package's type declarations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor(android): use PlaygroundServer screenshot API instead of duplicating in ScrcpyServer

Remove duplicate screenshot implementation from ScrcpyServer and use the existing PlaygroundServer /screenshot endpoint which already calls AndroidDevice.screenshotBase64(). This eliminates code duplication and leverages the existing infrastructure.

Changes:
- Remove /api/screenshot endpoint from ScrcpyServer
- Update App.tsx to call PlaygroundServer's /screenshot endpoint (port 9412)
- Also use PlaygroundServer's /interface-info endpoint for consistency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
…nents (#1392)

This change consolidates all PlaygroundSDK creation logic for report
components into a single shared utility module.

Changes:
- Created `apps/report/src/utils/report-playground-utils.ts` with
  `getReportPlaygroundSDK(serviceMode, agent?)` function
- Removed duplicate `getPlaygroundSDK` implementations from
  playground.tsx and playground/index.tsx
- Updated open-in-playground/index.tsx to use the shared function
- Removed unnecessary `createReportPlaygroundSDK` wrapper function
- All report components now use `PLAYGROUND_SERVER_PORT` constant
  from shared package

Benefits:
- Single source of truth for PlaygroundSDK creation in report
  components
- Static report files always connect to localhost:5800
- Reduced code duplication and improved maintainability

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
* refactor(core): rename Insight class to Service

This is a comprehensive refactoring that renames the Insight class
and all related types to Service for better semantic clarity.

Changes:
- Renamed directories: insight/ -> service/
- Renamed test files: insight.test.ts -> service.test.ts
- Updated 50+ type definitions
- Modified 18+ source files
- Synchronized all test files
- Updated external package dependencies

Core updates:
- Class: Insight -> Service
- Interface: InsightOptions -> ServiceOptions
- All InsightX types -> ServiceX types
- String literal 'Insight' -> 'Service'

Affected files:
- src/index.ts, src/yaml.ts, src/task-runner.ts
- src/agent/*.ts (agent, tasks, task-builder, ui-utils)
- tests/utils.ts and all test files
- External: chrome-extension, evaluation, report

Verification:
- TypeScript: 0 errors
- Lint: 530 files passed
- Build: successful (341.1 kB)

🤖 Generated with Claude Code

Co-Authored-By: Claude <[email protected]>

* fix(visualizer): update Insight references to Service

- Updated ExecutionTaskInsightLocate to ExecutionTaskServiceLocate
- Changed task.type check from 'Insight' to 'Service'
- Renamed insightTask variable to serviceTask for consistency

* fix(report): update Insight references to Service

- Updated ExecutionTaskInsightLocate to ExecutionTaskServiceLocate
  in sidebar, detail-side, and detail-panel components
- Changed task.type checks from 'Insight' to 'Service'
- Updated ExecutionTaskInsightAssertion to
  ExecutionTaskServiceAssertion
- Ensures report UI displays Service tasks correctly

* chore(tests): update comments from Insight to Service

* fix(tests): change task type from 'Insight' to 'Service' in tests

- Updated aiaction-cacheable.test.ts
- Updated page-task-executor-waitFor.test.ts
- Completes the Insight to Service refactoring

* fix(tests): update test expectations from 'Insight' to 'Service'

- Updated task-builder.test.ts expectations
- Updated page-task-executor-rightclick.test.ts expectations
- Fixes CI test failures

* refactor(core): use 'Insight' for ExecutionTask types

Keep Service class name but restore ExecutionTask type to 'Insight'
for consistency with UI display requirements.

Changes:
- ExecutionTaskType: 'Service' → 'Insight'
- All ExecutionTaskService* types → ExecutionTaskInsight*
- Runtime checks: task.type === 'Service' → task.type === 'Insight'
- ui-utils.ts: Removed special handling for Query/Assert subtypes
  to display "Insight / Query" and "Insight / Assert" correctly

Type display now follows the expected pattern:
- Planning / Plan
- Planning / Locate
- Action Space / {interface}
- Insight / Query
- Insight / Assert
- Insight / Locate

Files modified:
- packages/core/src/types.ts
- packages/core/src/agent/*.ts
- packages/core/src/task-runner.ts
- packages/visualizer/src/utils/replay-scripts.ts
- apps/report/src/components/**/*.tsx
- All test files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
Fixed ambiguous descriptions about sequential vs parallel execution:
- Updated --files parameter description to clearly state that files
  execute sequentially by default (when --concurrent=1) and can run
  concurrently with --concurrent parameter
- Removed misleading "run in parallel" text from example that doesn't
  use --concurrent parameter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Add explicit error throwing for failed Assert tasks with detailed
assertion failure messages including the AI's thought process.

This change brings the 1.0 branch in line with the main branch
commit 4761a6c, ensuring that Assert tasks fail explicitly when
the AI cannot verify the condition, rather than silently returning
null values.

Changes:
- Add error throwing for failed Assert tasks in tasks.ts
- Update test to expect error instead of null output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
yuyutaotao and others added 30 commits November 14, 2025 14:32
* docs(core): move model-selection docs

* docs(core): move model-selection docs

* docs(core): update model-selection doc

* docs(core): update docs of model config

* docs(core): update docs of model config

* docs(core): update links toward model strategy

* docs(core): update model doc

* docs(core): fix all the titles in ToC
#1448)

* docs(integration): add remote browser connection guide for Playwright and Puppeteer

- Add "Remote Playwright Integration with Midscene Agent" section to Playwright docs
- Add "Remote Puppeteer Integration with Midscene Agent" section to Puppeteer docs
- Reorganize existing content under "Direct integration with Midscene Agent" section
- Include CDP WebSocket connection examples for both frameworks
- Update both English and Chinese documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* docs(web): update docs of remote browser

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: yutao <[email protected]>
… and Puppeteer documentation (#1457)

- Create reusable get-cdp-url.mdx component with CDP WebSocket URL sources
- Add example project links for remote Playwright and Puppeteer integrations
- Update Prerequisites sections with @playwright/test dependency for Playwright
- Import and integrate GetCdpUrl component in all four documentation files
- Maintain 1.0 branch wording and descriptions for better clarity

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
)

This PR enables CLI to automatically support all device configuration
parameters without manually defining each one, and provides users with
documentation links for complete reference.

Changes:
- CLI: Auto-parse all web.*, android.*, ios.* parameters dynamically
- CLI: Replace manual parameter definitions with automatic parsing
- CLI: Add epilogue with documentation links and usage examples
- CLI: Ensure both kebab-case and camelCase formats for compatibility
- Tests: Add 5 comprehensive unit tests for device parameter parsing
- Docs: Update YAML script structure to include agent section
- Docs: Unify agent section naming across documentation

Key features:
- Users can now use any device parameter via CLI (e.g., --android.ime-strategy)
- New parameters are automatically supported without code changes
- Help output is cleaner with links to full documentation
- All 95 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
* feat(site): implement custom homepage with i18n support

This commit introduces a custom homepage for the Midscene.js documentation
site, featuring a modern design with comprehensive i18n support.

Key changes:
- Add custom HomeLayout component with Banner and FeatureSections
- Implement dark/light mode theming across all components
- Set up i18n system for English and Chinese translations
- Integrate Tailwind CSS v3 for styling
- Configure 1440px max-width layout with proper navigation alignment
- Create platform-specific descriptions (Web, iOS, Android, Any Interface)
- Add model and API-specific content sections
- Update routing: homepage at index.mdx, documentation at introduction.mdx
- Align navigation bar (40px padding) with page content

Components:
- theme/components/Banner.tsx: Hero section with stats and CTA buttons
- theme/components/FeatureSections.tsx: Feature showcase sections
- theme/i18n/*: Translation files and hooks
- theme/pages/index.tsx: Custom homepage layout

Configuration:
- Add Tailwind CSS configuration and PostCSS setup
- Update rspress.config.ts with Tailwind preEntry
- Update styles/index.css for navigation alignment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat(deps): update Tailwind CSS and PostCSS dependencies to version 4.1.11

---------

Co-authored-by: Claude <[email protected]>
* docs(core): update api reference doc

* docs(core): fix all dead links

* docs(core): fix all dead links
…guration (#1456)

* feat(shared): introduce MIDSCENE_PLANNING_STYLE for unified config

Replace multiple MIDSCENE_USE_* environment variables with a single
MIDSCENE_PLANNING_STYLE parameter to simplify model configuration.

## Changes

### New Environment Variable
- Added MIDSCENE_PLANNING_STYLE with supported values:
  - default (equivalent to qwen3)
  - qwen2.5, qwen3
  - doubao-1.5, doubao-1.6
  - ui-tars-1.0, ui-tars-1.5
  - gemini

### Core Features
1. Auto-inference from model name: Automatically detect planning
   style from model name if not configured
2. Legacy compatibility: Support old MIDSCENE_USE_* variables
   with deprecation warnings
3. Conflict detection: Error when both new and legacy variables
   are set
4. Comprehensive validation: Validate planning style values and
   provide clear error messages

### Implementation Details
- Added type definitions in types.ts
- Implemented parsing logic in parse.ts:
  - inferPlanningStyleFromModelName()
  - convertPlanningStyleToVlMode()
  - parsePlanningStyleFromEnv()
- Integrated into decide-model-config.ts for planning intent
- All warnings output via console.warn()

### Testing
- Added 39 new test cases covering:
  - Model name inference
  - Planning style conversion
  - Legacy variable compatibility
  - Error handling
  - All planning style values
- Updated existing tests
- All 162 tests passing

## Migration Guide

Before (deprecated):
  MIDSCENE_USE_QWEN3_VL=1
  MIDSCENE_USE_VLM_UI_TARS=DOUBAO-1.5

After (recommended):
  MIDSCENE_PLANNING_STYLE=qwen3
  MIDSCENE_PLANNING_STYLE=ui-tars-1.5

Old environment variables still work with deprecation warnings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(shared): resolve TypeScript type error in convertPlanningStyleToVlMode

- Added non-null assertion for vlMode property
- Explicitly destructure parsed result instead of using spread operator
- vlMode is guaranteed to be non-undefined since vlModeRaw is always valid

Fixes build error: Type 'TVlModeTypes | undefined' is not assignable to
type 'TVlModeTypes'

* fix(shared): correct UI-TARS model inference priority logic

- Prioritize ui-tars check before doubao
- ui-tars + 1.5 → vlm-ui-tars-doubao-1.5
- ui-tars + 1.0 → vlm-ui-tars
- ui-tars (no version) → vlm-ui-tars-doubao (Volcengine deployment)
- doubao (non-UI-TARS) → doubao-vision

This ensures models deployed on Volcengine are correctly identified.

* refactor(shared): simplify planning style parsing by removing model name inference

* fix(tests): clear legacy environment variables before model config tests

* Update packages/shared/src/env/decide-model-config.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/shared/src/env/parse.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/shared/src/env/parse.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/shared/src/env/decide-model-config.ts

Co-authored-by: Copilot <[email protected]>

* refactor(parse): add type guard for planning style validation and improve error message

* refactor(decide-model-config): consolidate vlMode and uiTarsVersion parsing logic for planning intent

* refactor(model-config): migrate from MIDSCENE_USE_... to MIDSCENE_PLANNING_STYLE for model configuration

* refactor(docs): update model configuration and strategy documentation for clarity

* refactor(env): rename planning style references to model family for consistency

* refactor(parse): update references from 'qwen-vl' to 'qwen2.5-vl' for consistency

* refactor(core): update references from qwen-vl to qwen2.5-vl for consistency across the codebase

* refactor(model-config): update model family references from 'qwen-vl' to 'qwen2.5-vl' for consistency

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: Copilot <[email protected]>
* feat(site): add responsive design and upgrade to Tailwind CSS v4

This commit adds comprehensive responsive design to the custom homepage
and upgrades Tailwind CSS from v3 to v4.1.11.

Key changes:
- Upgrade Tailwind CSS to v4.1.11 with @tailwindcss/postcss plugin
- Add mobile-first responsive design to Banner component
- Add mobile-first responsive design to FeatureSections component
- Implement responsive navigation bar with media queries
- Optimize button layout for single-line text display

Responsive breakpoints:
- Mobile: < 768px (base styles)
- Tablet: >= 768px (md: prefix)
- Desktop: >= 1024px (lg: prefix)

Banner responsive features:
- Adaptive min-height: 400px (mobile) → 664px (desktop)
- Responsive padding: 20px (mobile) → 40px (desktop)
- Scalable typography: 32px (mobile) → 80px (desktop) for title
- Flexible button layout: stacked (mobile) → horizontal (desktop)

FeatureSections responsive features:
- Adaptive section spacing: 48px (mobile) → 120px (desktop)
- Responsive card layout: single column (mobile) → multi-column (desktop)
- Scalable card heights: 120px (mobile) → 160px (desktop)
- Flexible text sizes: 14px (mobile) → 16px (desktop)

Navigation responsive updates:
- Responsive padding: 20px (mobile) → 40px (desktop)
- Consistent alignment with page content across all screen sizes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(theme): add missing keys for Banner and FeatureSections components

---------

Co-authored-by: Claude <[email protected]>
…1471)

This commit refactors the Banner and FeatureSections components to use
Tailwind CSS's built-in dark mode modifier instead of the useDark() hook
from @rspress/core/runtime.

Changes:
- Remove useDark import and usage from both components
- Replace conditional classNames with Tailwind's dark: modifier
- Remove unnecessary key props that were based on dark state
- Simplify component logic by relying on Tailwind's dark mode configuration

Benefits:
- Cleaner, more maintainable code
- Better performance (no runtime JS for dark mode detection)
- Consistent with Tailwind CSS best practices
- Reduced bundle size by removing unused hook dependency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Add JSON normalization to handle leading/trailing whitespace in LLM
output that causes action type lookup failures.

This fix addresses the issue where LLM-generated action plans contain:
- Leading/trailing spaces in object keys
  (e.g., " prompt " instead of "prompt")
- Leading/trailing spaces in string values
  (e.g., " Tap" instead of "Tap")

Solution:
- Added normalizeJsonObject() function in safeParseJson()
- Recursively trims all object keys and string values
- Works with nested objects, arrays, and all vlMode types
- Added 13 comprehensive test cases

Fixes #1435

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
…1473)

* refactor(ui): improve dark mode consistency and detail panel layout

- Unify dark mode background colors across components (#141414)
- Update detail-side layout with better alignment:
  * Replace arrow icons with Ant Design icons (RightOutlined/DownOutlined)
  * Align details text with info-tab text at 20px from left
  * Center arrow icon within 20px left spacing
  * Adjust info-tabs height to 46px
- Improve dark mode text colors for better readability:
  * Update meta-key color to #9599A6
  * Update sidebar title color to #D1D3DB
  * Update segmented selected item color to #F5F9FD
- Apply consistent dark mode styling across visualizer components

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* refactor(theme): update dark mode colors for detail panel and detail side

---------

Co-authored-by: Claude <[email protected]>
* feat(shared): unify midscene family env

* fix(web-integration): add MIDSCENE_MODEL_FAMILY to test mocks

Update test mock configurations to include MIDSCENE_MODEL_FAMILY
to ensure planning-related tests pass with the new unified naming.

Also update the error message in model-config-manager to reference
MIDSCENE_MODEL_FAMILY instead of the deprecated
MIDSCENE_PLANNING_VL_MODE.

Fixes test failures:
- PageAgent RightClick > should handle aiRightClick with locate options
- PageAgent RightClick > should be supported in ai method with rightClick
- aiInput with number value tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
- Update all model configuration examples to use the new 4-parameter format:
  - MIDSCENE_MODEL_BASE_URL
  - MIDSCENE_MODEL_API_KEY
  - MIDSCENE_MODEL_NAME
  - MIDSCENE_MODEL_FAMILY
- Replace deprecated OPENAI_API_KEY/OPENAI_BASE_URL references
- Add links to model strategy documentation for more details
- Update both Chinese and English documentation
- Fix: add missing dayjs dependency to @midscene/core

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Add three new navigation actions (Navigate, Reload, GoBack) to the
Web Action Space for browser navigation control.

**New Actions:**
- **Navigate**: Navigate to a URL in the current tab
  - Parameter: `url` (string)
- **Reload**: Reload the current page
- **GoBack**: Navigate back in browser history

**Implementation:**
- Import zod from @midscene/core instead of adding new dependency
- Add navigate, reload, goBack methods to Page class in base-page.ts
- Add method declarations to AbstractWebPage for type safety
- Actions directly call page methods, avoiding code duplication
- All actions include proper error handling for unsupported page types
- Support both Puppeteer and Playwright page types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Add navigate, reload, and goBack methods to ChromeExtensionProxyPage
to support navigation actions in Chrome Extension environment.

**Implementation:**
- navigate(url): Uses chrome.tabs.update to navigate to URL
- reload(): Uses chrome.tabs.reload to refresh the page
- goBack(): Uses chrome.tabs.goBack to go back in history
- All methods wait for network idle after operation

This completes the navigator actions support for all web page types:
- Puppeteer
- Playwright
- Chrome Extension

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
This commit restores the `aiAction` fixture for Playwright integration
while marking it as deprecated. Users are encouraged to use `aiAct`
instead.

Changes:
- Add `aiAction` fixture implementation
- Add `aiAction` to PlayWrightAiFixtureType
- Mark `aiAction` as deprecated with JSDoc comments
- Point users to use `aiAct` as the recommended alternative

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
Add support for displaying timeout test status in the report overview component:

- Add timedOut counter and timedOutTests tracking in report overview statistics
- Handle "timedOut" and "interrupted" status in test case filtering
- Add new "Timeout" stats card with orange color (#ff8c00 in light mode, #ffa940 in dark mode)
- Update PlaywrightTaskAttributes type to use strict union type for test status
- Update playwright case selector to show "Timeout" filter option
- Fix App.tsx type error by setting undefined as default status value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
…1485)

* feat(playground): add device options configuration for Android/iOS

This commit implements device-specific configuration options in the
playground UI, allowing users to customize device behavior such as
keyboard handling and IME strategy.

Changes:
- Add device options state management with localStorage persistence
- Create UI controls for Android-specific options (imeStrategy,
  autoDismissKeyboard, keyboardDismissStrategy, alwaysRefreshScreenInfo)
- Create UI controls for iOS-specific options (autoDismissKeyboard)
- Extend execution pipeline to pass deviceOptions from frontend to backend
- Update agent.interface.options on the server side when deviceOptions
  are received
- Optimize parameter flattening to avoid delete operator performance issues

Technical implementation:
- Frontend: Store device options in Zustand with localStorage sync
- SDK: Include deviceOptions in remote execution adapter payload
- Server: Update agent.interface.options to apply settings globally
- This ensures all actions (including those called by aiAct) use the
  updated options

Fixes #1282

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* feat(playground): add dynamic device type detection for iOS/web playground

The universal playground app now detects the device type from the
connected server's /interface-info API and displays device-specific
configuration options accordingly.

This ensures that iOS playground users can see and configure iOS
device options (autoDismissKeyboard), while web users see no
device-specific options.

Related to #1282

* fix(ios): improve keyboard dismissal to prevent accidental UI interactions

The previous implementation used a swipe down gesture at a fixed screen
position (1/3 from top) which could accidentally click on search results
or other UI elements that appeared after text input.

Changes:
- Use WDA's dismissKeyboard API as the primary method (more reliable)
- Fall back to safer swipe gesture (from bottom up) if API fails
- Increase wait time from 300ms to 500ms for UI stability
- Update autoDismissKeyboard documentation to reflect default behavior

Technical details:
- WDA API tries common keyboard button names: return, done, go, search, etc.
- Swipe fallback uses safer coordinates: from 90% height to 50% height
- This prevents accidental taps on UI elements in the upper portion of screen

Related to #1282

* feat(screenshot-viewer): add screenshot viewer component with styles and functionality

* fix(tests): enhance keyboard dismissal tests to simulate failure scenarios

---------

Co-authored-by: Claude <[email protected]>
)

Fixed an issue where screenshots from uiContext.screenshotBase64
were not displayed when activeTask.recorder was empty or undefined.

The screenshot extraction logic was previously nested inside the
recorder length check, which prevented uiContext screenshots from
being shown when there were no recorder entries.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <[email protected]>
* feat(core): add automatic LangSmith and Langfuse integration

- Add MIDSCENE_LANGSMITH_DEBUG and MIDSCENE_LANGFUSE_DEBUG env variables
- Auto-wrap OpenAI client when env variables are enabled
- Users only need to install the package and set env variable
- No more manual createOpenAIClient code required

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(core): use variable to prevent bundler static analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* docs(site): add auto-integration guide for LangSmith and Langfuse

- Add new sections explaining simplified environment variable approach
- Update createOpenAIClient note to recommend env var method
- Add installation steps and configuration examples
- Include both Chinese and English documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* docs(site): add Langfuse guide to model-config

- Update LangSmith section with installation steps
- Add Langfuse integration section with complete setup guide
- Include both Chinese and English documentation
- Add cross-references to API documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* docs(site): update environment variable examples

- Update LangSmith configuration with actual env var names
- Update Langfuse configuration with correct BASE_URL variable
- Replace sample API keys with placeholder format
- Add region-specific endpoint examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
…1490)

* feat(report): improve UI display and simplify task output structure

This commit improves the report UI display and simplifies the task execution output structure:

**Report UI Improvements:**
- Separate image display from text in task parameters with clickable links
- Add element-detail-box styling for locate/element objects
- Support text wrapping in sidebar titles
- Highlight elements from Action Space task params in Screenshots view
- Hide Output section when output is undefined

**Core Changes:**
- Simplify task output structure by returning actionResult directly
- Change operation functions (aiTap, aiScroll, etc.) to return void instead of values
- Update locateParamStr to support Action Space description field

**Technical Details:**
- Modified ui-utils.ts to remove image concatenation logic
- Enhanced detail-side component to handle multiple data structures
- Added shared renderElementDetailBox function for code reuse
- Updated detail-panel to extract elements from Action Space params
- Added dark mode support for element-detail-box styles

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(report): move renderElementDetailBox definition before MetaKV component

Move the renderElementDetailBox function definition to before the MetaKV component
to fix potential hoisting issues. Arrow functions defined with const are not hoisted,
so they must be defined before being referenced.

This ensures the function is available when MetaKV's renderContent calls it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(core): side panel style

* feat(report): implement extractInsightParam function for improved task parameter handling

* feat(report): add renderMetaContent and extractTaskImages functions for improved content handling

* chore(deps): update @RsPress packages to version 2.0.0-rc.1 in package.json and pnpm-lock.yaml

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: yutao <[email protected]>
…ls (#1491)

* fix(playwright): fix this binding issue in AI fixture declarative calls

Fixed a this binding issue where methods called through bracket notation
(agent[aiActionType]) would lose their context, causing TypeError when
accessing instance methods or properties.

Added .bind(agent) to ensure proper this context is maintained when
dynamically invoking agent methods like aiTap, aiQuery, aiAssert, etc.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Update packages/web-integration/src/playwright/ai-fixture.ts

Co-authored-by: Copilot <[email protected]>

* Update packages/web-integration/tests/unit-test/playwright-ai-fixture-this-binding.test.ts

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants