feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307)#3341
Conversation
Run enigo keyboard/mouse on the app main thread via a native-registry executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge, and downscaled screenshots so the model can see them. Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
… loop Adds the Rust-internal automate engine (poll-until-stable settle, playback verification), the AXEnabled diagnostics field + settle primitives on ax_interact, the Music fast-path, and the Windows UIA superset. Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow. Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a Rust-driven perceive→decide→act→settle→verify automation loop ( ChangesAutomate Rust UI Automation System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…accessibility # Conflicts: # src/openhuman/tools/impl/browser/screenshot.rs # src/openhuman/tools/impl/computer/main_thread.rs
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/openhuman/accessibility/automate.rs (1)
313-318: 💤 Low valueConsider downgrading per-step logging to debug level.
Line 313 logs every iteration's action, app, label, and filter at
info!level. Since the loop can run up to 12 steps by default, this produces substantial log volume. As per coding guidelines, step-by-step trace logging for development diagnostics should usedebug!ortrace!levels.♻️ Suggested adjustment
- log::info!( + log::debug!( "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?} filter={:?}", action.action, action.label, action.filter );Based on learnings: "Use
log/tracingatdebug/tracelevels for diagnostic logging in Rust" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/automate.rs` around lines 313 - 318, The per-step logging currently using log::info! with the LOG_PREFIX and fields action.action, action.label, and action.filter should be downgraded to debug-level diagnostics; replace the log::info! invocation that prints "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?} filter={:?}" (and passes action.action, action.label, action.filter) with log::debug! (or tracing::debug!) so step-by-step iterations are only emitted in debug mode while keeping the same message format and fields for grep-friendly tracing.src/openhuman/accessibility/app_fastpaths/mod.rs (1)
21-29: ⚡ Quick winAdd branch-level debug tracing in fast-path dispatch.
try_fastpathcurrently has no debug/trace logs for match vs fallthrough, which makes dispatch diagnosis harder in automation runs.♻️ Proposed change
pub async fn try_fastpath( app: &str, goal: &str, backend: &dyn AutomateBackend, ) -> Option<AutomateOutcome> { + log::debug!("[automate::fastpath] dispatch app={app:?}"); if music::matches(app, goal) { + log::debug!("[automate::fastpath] matched=music app={app:?}"); return Some(music::run(goal, backend).await); } + log::debug!("[automate::fastpath] no-match app={app:?}"); None }As per coding guidelines, "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes" and "use
log/tracingatdebug/tracelevels for diagnostic logging in Rust".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/app_fastpaths/mod.rs` around lines 21 - 29, Add branch-level diagnostic logging to try_fastpath: emit a trace/debug entry when the function is entered (including app and goal), log a branch decision when music::matches(app, goal) is true or false with a stable grep-friendly prefix (e.g., "fastpath:match=true"/"fastpath:match=false"), and log before/after the external call to music::run(goal, backend) (including success/returned outcome). Update try_fastpath to use the tracing/log macros (trace! or debug!) consistently and include the function name and relevant variables to aid filtering.src/openhuman/accessibility/app_fastpaths/music.rs (1)
327-327: ⚡ Quick winUse backend wait abstraction for retry delay consistency.
This direct
tokio::time::sleepbypasses backend-injected timing behavior; usingbackend.wait(...)keeps timing deterministic in tests and alternate backends.♻️ Proposed fix
- tokio::time::sleep(std::time::Duration::from_millis(700)).await; + backend.wait(700).await;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/accessibility/app_fastpaths/music.rs` at line 327, Replace the direct tokio::time::sleep call with the backend wait abstraction to preserve backend-injected timing behavior: find the tokio::time::sleep(std::time::Duration::from_millis(700)).await usage and change it to call backend.wait(std::time::Duration::from_millis(700)).await (preserving the same duration and await semantics) so tests and alternate backends see deterministic timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/voice-automate-plan.md`:
- Line 20: Add missing language identifiers to the fenced code blocks in the
docs/voice-automate-plan.md file: locate the fenced block containing the diagram
that begins with "orchestrator (chat LLM)" (the AutomateTool/automate{ app, goal
} diagram) and the other fenced block noted in the review, and change the
opening backticks from ``` to ```text so both code fences include a language
identifier (e.g., ```text).
In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Around line 56-69: The current logic finds the last "play" via lower.rfind and
only verifies the left boundary (before_ok), which lets tokens like "playback"
slip through; update the parsing in this block (variables lower, idx, before_ok,
after, q) to also verify the right boundary by checking the character
immediately after the matched "play" (i.e., the first char of after) is either
absent or not alphabetic, and only proceed to trim/assign q when both left and
right boundaries confirm "play" is a full word.
- Around line 150-163: The panic comes from slicing the Unicode-folded string
`hl = haystack.to_lowercase()` with byte offsets `i` derived from the original
`haystack`; switch to ASCII-only lowercasing to keep byte lengths stable: change
`hl` and `nl` to use `to_ascii_lowercase()` (for the ASCII needle `" by "` this
preserves byte counts), continue advancing `i` by `needle.len()` and by
`ch.len_utf8()` as before, and perform the `hl[i..].starts_with(&nl)` check
against the ASCII-lowercased `hl`/`nl` within the existing `replace_ci` function
so slicing no longer panics.
---
Nitpick comments:
In `@src/openhuman/accessibility/app_fastpaths/mod.rs`:
- Around line 21-29: Add branch-level diagnostic logging to try_fastpath: emit a
trace/debug entry when the function is entered (including app and goal), log a
branch decision when music::matches(app, goal) is true or false with a stable
grep-friendly prefix (e.g., "fastpath:match=true"/"fastpath:match=false"), and
log before/after the external call to music::run(goal, backend) (including
success/returned outcome). Update try_fastpath to use the tracing/log macros
(trace! or debug!) consistently and include the function name and relevant
variables to aid filtering.
In `@src/openhuman/accessibility/app_fastpaths/music.rs`:
- Line 327: Replace the direct tokio::time::sleep call with the backend wait
abstraction to preserve backend-injected timing behavior: find the
tokio::time::sleep(std::time::Duration::from_millis(700)).await usage and change
it to call backend.wait(std::time::Duration::from_millis(700)).await (preserving
the same duration and await semantics) so tests and alternate backends see
deterministic timing.
In `@src/openhuman/accessibility/automate.rs`:
- Around line 313-318: The per-step logging currently using log::info! with the
LOG_PREFIX and fields action.action, action.label, and action.filter should be
downgraded to debug-level diagnostics; replace the log::info! invocation that
prints "{LOG_PREFIX} step={step} action={:?} app={target_app:?} label={:?}
filter={:?}" (and passes action.action, action.label, action.filter) with
log::debug! (or tracing::debug!) so step-by-step iterations are only emitted in
debug mode while keeping the same message format and fields for grep-friendly
tracing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 29c1b63f-deed-491e-8d93-2a45a1b0b493
📒 Files selected for processing (13)
docs/voice-automate-plan.mdsrc/openhuman/accessibility/app_fastpaths/fastpaths_tests.rssrc/openhuman/accessibility/app_fastpaths/mod.rssrc/openhuman/accessibility/app_fastpaths/music.rssrc/openhuman/accessibility/automate.rssrc/openhuman/accessibility/automate_tests.rssrc/openhuman/accessibility/ax_interact.rssrc/openhuman/accessibility/ax_interact_tests.rssrc/openhuman/accessibility/helper.rssrc/openhuman/accessibility/mod.rssrc/openhuman/accessibility/uia_interact.rssrc/openhuman/tools/impl/system/launch_app.rssrc/openhuman/tools/impl/system/mod.rs
…e replace - Music fast-path now requires 'play' as a whole word on BOTH sides so 'playback …' no longer parses as a play intent. - replace_ci no longer indexes the lowercased copy with original byte offsets (to_lowercase can change byte lengths → mid-codepoint slice panic on Unicode); compares on the original slice instead. - Tests for both; tag the architecture doc's fenced diagram + drop a stray trailing code fence (MD040).
Independent review (beyond the CodeRabbit pass)Reviewed the full accessibility slice — the Findings & resolutions
Reviewed clean
No further correctness issues. LGTM once CI is green. |
Summary
Slice 2/8 of #3307 — the accessibility perception +
automateengine.accessibility/automate.rs) with poll-until-stable settle and playback verification.ax_interactgains theAXEnableddiagnostics field +counts_settled/ax_wait_settledsettle primitives; Music fast-path; Windows UIA superset.launch_platformaspub(crate)so the automate loop can launch apps mid-flow.Builds on the merged Phase 1 (#3168) — these files evolve the already-merged
ax_interact/launch_app.Files (13)
accessibility/{automate,automate_tests,ax_interact,ax_interact_tests,helper,mod,uia_interact}.rs,accessibility/app_fastpaths/{mod,music,fastpaths_tests}.rs,tools/impl/system/{launch_app,mod}.rs,docs/voice-automate-plan.md.Summary by CodeRabbit