This framework tests desktop applications across multiple operating systems
using Proxmox VMs, optional GPU passthrough, VNC-backed input, AT-SPI semantic automation on Linux, and VLM screen
understanding. The current implementation is Python-first (automation/) with
TOML app/VM configuration and YAML functional scenarios.
┌─────────────────────────────────────────────────────────────┐
│ Host Machine │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ mOSdat Python CLI │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ config │ │ runner │ │ live dashboard │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────────────────────────────────────────┐ │ │
│ │ │ Proxmox API (REST) │ │ │
│ │ └──────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
└────────────────────────────┼─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Proxmox VE Server │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ VM 100 │ │ VM 101 │ │ VM 102-105 │ │
│ │ Fedora 42 │ │ Ubuntu │ │ (Other VMs) │ │
│ │ + GPU? │ │ 22.04 │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ NVIDIA RTX 3060 (VFIO) │ │
│ │ Can be attached to any VM │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
examples/*.toml
├── [app] source/build/package metadata
├── [proxmox] API connection
├── [vlm] model endpoint and expected model
└── [[vm]] VM name, VMID, desktop, packages, SSH user
The main command surface is:
| Command | Purpose |
|---|---|
mosdat run |
Build/deploy/test matrix |
mosdat test |
Test pre-built package(s) |
mosdat functional |
Run VLM functional UI scenarios |
mosdat preflight |
Validate TOML schema, VM deps, binary symbols, and setup-shell dry-run |
mosdat build |
Clone PR head, build .deb, deploy to VM, and verify asar symbols |
mosdat replay |
Re-ask VLM against cached screenshots from a prior result dir |
mosdat vlm-cache |
Inspect, prune, or clear the VLM response cache |
mosdat doctor |
Per-VM connectivity and dependency checklist |
mosdat live |
Serve live triage dashboard and Author Workbench |
mosdat author |
Agent CLI for authoring API |
mosdat dashboard |
Generate static historical dashboard |
mosdat visual |
Capture/check visual references |
mosdat confirm |
Confirm or verify-fix GitHub issues |
mosdat live --config <config.toml> also exposes an authoring API for agents
and the browser workbench. The intended flow is:
GET /api/author/vmsto choose a running VM.POST /api/author/sessionwith{"vm": "ubuntu2404"}.POST /api/author/capturewith{"session_id": "..."}to refresh VNC.POST /api/author/vlm/localizeor/api/author/vlm/verifyto inspect the screen.POST /api/author/actionwithconfirm: trueto runhover,click,type,key,shell,wait, orlaunch.POST /api/author/stepto append one step or replace the draftstepsarray.POST /api/author/validateto check that the draft flow is runnable.GET /api/author/export?session=...&name=...to retrieve scenario YAML.POST /api/author/closeto release the VNC session.
Agents should prefer the CLI wrapper because it prints compact JSON:
python -m automation.main author --url http://127.0.0.1:8082 vms
python -m automation.main author --url http://127.0.0.1:8082 doctor
python -m automation.main author --url http://127.0.0.1:8082 start --vm ubuntu2404
python -m automation.main author --url http://127.0.0.1:8082 capture --session SESSION --output /tmp/screen.bmp
python -m automation.main author --url http://127.0.0.1:8082 localize --session SESSION --prompt "help tooltip"
python -m automation.main author --url http://127.0.0.1:8082 describe --session SESSION --x 120 --y 240
python -m automation.main author --url http://127.0.0.1:8082 click --session SESSION --x 5 --y 6 --prompt "help tooltip"
python -m automation.main author --url http://127.0.0.1:8082 prompt-click --session SESSION --prompt "help tooltip"
python -m automation.main author --url http://127.0.0.1:8082 prompt-hover --session SESSION --prompt "help tooltip"
python -m automation.main author --url http://127.0.0.1:8082 prompt-type --session SESSION --prompt "message box" --text "hello"
python -m automation.main author --url http://127.0.0.1:8082 type --session SESSION --text "hello"
python -m automation.main author --url http://127.0.0.1:8082 key --session SESSION --key enter
python -m automation.main author --url http://127.0.0.1:8082 validate --session SESSION
python -m automation.main author --url http://127.0.0.1:8082 export --session SESSION --name tooltip-flow
python -m automation.main author --url http://127.0.0.1:8082 export --session SESSION --name tooltip-flow --output shared/scenarios/functional/tooltip-flow.yaml
python -m automation.main author --url http://127.0.0.1:8082 step --session SESSION --json '{"key":"escape"}'
python -m automation.main author --url http://127.0.0.1:8082 close --session SESSION
YAML scenario -> FunctionalRunner
|
├── Proxmox VNC capture/input
├── Linux AT-SPI role/name targeting when available
├── VLM localize/verify fallback
├── SSH only for shell/launch/focus helpers
└── events.jsonl + screenshots in results/functional/<run>/<vm>/
VNC input is display-server agnostic and is the preferred path for clicks, hover, typing, and key presses. This avoids X11/Wayland focus and xauth problems.
--inject-config / --inject-servers / --inject-app-name flags run an SSH
pre-stage phase before the VNC loop to write Electron userData declaratively.
Setting x11 = "auto" on a VM config injects DISPLAY, XAUTHORITY, and the
ozone preamble into shell steps automatically (default off).
1. Git checkout specific version
2. yarn build (TypeScript → JavaScript)
3. electron-builder (→ RPM/DEB/AppImage/EXE)
4. Package stored in dist/
1. Get VM IP via Proxmox guest agent API
2. SCP package to VM /tmp/
3. SSH: sudo dnf/apt install package
4. Verify installation
1. SSH into VM
2. Set environment variables (WAYLAND_DISPLAY, etc.)
3. Run rocketchat-desktop with timeout
4. Capture exit code and output
5. Parse results (PASS/FAIL/SEGFAULT)
| Test | Environment | Expected (Fixed) |
|---|---|---|
| wayland-real | Valid Wayland socket | Native Wayland |
| wayland-fake | Non-existent socket | X11 fallback (no crash) |
| wayland-nodisp | No WAYLAND_DISPLAY | X11 fallback |
| x11 | X11 session | X11 |
| Code | Meaning | Result |
|---|---|---|
| 0 | Clean exit | PASS |
| 124 | Timeout (app ran N seconds) | PASS |
| 139 | SIGSEGV (segfault) | FAIL |
| 134 | SIGABRT | FAIL |
| 6 | SIGABRT (alternate) | FAIL |
| Other | Unknown | UNKNOWN |
hostpci0=0000:01:00,pcie=1
- GPU available for compute/rendering
- VNC console still works
- Recommended for testing
hostpci0=0000:01:00,pcie=1,x-vga=1
- GPU is primary display
- VNC console blank
- Requires physical monitor
- Not recommended for automated testing
mOSdat/
├── automation/ # CLI, config, Proxmox, VLM, runners, dashboards
├── examples/ # App/VM TOML configs
├── shared/scenarios/ # Functional YAML scenarios
├── docs/ # Architecture and runbooks
├── tests/ # Pytest coverage and CLI help snapshots
└── results/ # Generated run artifacts (mostly gitignored)
Older shell helpers under shared/ are retained for compatibility and low-level
reference, but the Python CLI is the canonical workflow.
- Package formats differ: RPM, DEB, AppImage, Snap, Flatpak, MSI.
- Package managers differ: dnf, apt, pacman, zypper, snap, flatpak.
- Desktop behavior differs: GNOME, KDE, X11, Wayland, Windows.
- Launch and cleanup paths differ: app names, temp dirs, package paths.
- Reproducibility matters: TOML config makes VM/app/package assumptions explicit.