Skip to content

Latest commit

 

History

History
476 lines (390 loc) · 22.6 KB

File metadata and controls

476 lines (390 loc) · 22.6 KB
name aiui
description Render native desktop dialogs on the user's machine via aiui's MCP server — `confirm` before destructive actions (delete, drop, force-push, deploy), `ask` for pick-one-of-N where context per option matters, `form` for multi-input requests, secrets, dates, sliders, sortable lists, or image confirmation.

aiui — Dialog design for Claude agents

aiui exposes three MCP tools that render native dialogs on the user's machine:

  • confirm — irreversible yes/no
  • ask — single- or multi-choice with descriptions and optional free-text fallback
  • form — composite window with typed fields and multiple action buttons

Default to a dialog, not to chat

The user installed aiui because they want the agent to use it. If you catch yourself about to write any of these in chat, stop and use aiui instead:

  • "Would you like me to …?", "Should I proceed?", "Are you sure?" → confirm
  • "Do you want option A or B?", numbered lists for the user to pick from → ask
  • "Please tell me the …", "What's the …?" with more than one ask → form
  • Any step that is destructive or hard to undo (delete, drop, force-push, rollback, prod deploy) → confirm with destructive: true, even if the user already gave loose approval. The dialog makes the consequence explicit and ships the structured answer back, no chat parsing.
  • Any step that needs a secret for a moment (token, password) → form with a password field, never paste in chat.
  • Any step that is a choice with consequences worth seeing side-by-side ("which deploy strategy?", "which migration path?") → ask with per-option description.
  • Any step that wants the user to rank or sort items → form with a sortable list field.
  • Any step that wants a date, datetime, range, color, or numeric value in a bounded intervalform with the matching field.
  • Any step where you'd sketch a flow, sequence, state, hierarchy or schedule in ASCII ("Step A → Step B → ...") → form with a mermaid field. ASCII boxes-and-arrows look terrible in any proportional-font surface; the mermaid field renders to clean SVG. See the dedicated section below.
  • Any step that asks "is this generated image OK?"confirm with image: {src}. Don't fall back to a form-with-image-and-two- buttons when the question is a plain yes/no.
  • Any step that asks "which of these images?" with 2–6 candidates → ask with thumbnail per option. Use form + image_grid only when there are many candidates (≥ 7) or the picker needs multi-select.

When chat actually wins

Skip the dialog for content the user reads, doesn't answer:

  • Status reports, summaries, code snippets, logs, error traces — render in chat.
  • Single free-text answers where the user would type the same thing into a dialog box anyway — just ask in chat.
  • Anything where the answer is "go on", and the user is paying attention.

Tool choice

Intent Tool
Yes/no, especially destructive confirm
Yes/no on a generated image ("is this OK?") confirm with image: {src}
2–6 options, possibly with per-option context ask
Pick one of N images ("A or B or C") ask with thumbnail per option
Multi-field input, multi-action footer form
Pick one of many images (e.g. 12 logo variants) form with image_grid
Per-item verdict on a batch of images/videos ("approve/revise/skip each") gallery
Single free-text answer just ask in chat
More than 8 fields split into multiple form calls; do not cram one dialog

Writing labels and copy

  • Imperative or noun, ≤ 6 words per label, no punctuation, no emoji.
  • Parallel grammar within a dialog. Mixing styles ("Name" / "Please enter your age" / "What's your role?") reads as AI slop.
  • Defaults a real user would actually pick, not "enter value here".
  • description/static_text only when the label alone is ambiguous — avoid redundancy.

Action buttons (form only)

  • Verb-based, concrete. "Create report" beats "OK".

  • Styling (pick one per button):

    • primary: true → blue, the main action.
    • success: true → green, positive-outcome verbs ("Approve", "Publish").
    • destructive: true → red, irreversible verbs ("Delete", "Rollback").
    • none → neutral outlined button.

    Never style a save button red; never style a delete button green.

  • Offer an escape hatch (skip_validation: true) so required-field validation never traps the user.

  • ≤ 3 actions. If you're tempted to add a fourth, rethink the flow.

The list field — one widget, four modes

selectable multi_select sortable Mode
Static info list
Single-choice (radio)
Multi-choice (checkboxes)
Ordering via drag handles
Pick-and-order

Result is always {selected: [values], order: [values]}order reflects drag changes, selected reflects checkbox state. Items can carry a thumbnail — see Image sources below for the accepted URL formats. Perfect for shotlists, mood boards, carousel slides where the visual anchor matters more than the label.

The table field — column-aware row triage

When you'd otherwise dump 30 branches / 50 search results / 20 stale files into chat, hand it as a table instead. Columns carry the context (date, size, owner) that list can't, rows are clickable for selection, and the agent gets back the picked rows by their value.

columns: [{key, label, align?: "left"|"right"|"center"}]
rows:    [{value, values: {<key>: <string|number|null>}}]
multi_select?: true     # checkbox-per-row
sortable_by_column?: true   # click headers to sort

Result: {selected: [values], order: [values], sort: {column, dir}}. The order field reflects user-driven sorts so you can preserve their view if you reopen the form.

Schematic diagrams: mermaid

When you'd otherwise reach for ASCII boxes-and-arrows, draw flowcharts in +--+-style art, or sketch a sequence diagram with --> and |, stop. Use the mermaid field in a form instead.

Spec: {kind: "mermaid", source: "<DSL>", label?: string, max_height?: number}.

The source is a Mermaid-DSL string. aiui pipes it through mermaid.render(), DOMPurify-sanitises the resulting SVG, and embeds it inline. Covers flowcharts, sequence diagrams, state diagrams, class diagrams, gantt, ER, mind-maps, and pie charts — pick the one that fits the situation.

{
  "kind": "mermaid",
  "source": "graph TD; Start --> Probe; Probe -- ok --> Render; Probe -- fail --> Retry; Retry --> Probe"
}

Read-only — like markdown and image, it sits between input fields to give context, not to ask anything. Result-handling unchanged.

Anti-patterns:

  • ASCII / box-drawing art for any flow, sequence, state, or hierarchy — that's exactly the slop this field replaces.
  • Trying to render a picture as Mermaid — Mermaid is structured diagrams (nodes, edges, swimlanes), not free drawing. For arbitrary images use image with a real source.
  • Embedding HTML in node labels — Mermaid's securityLevel: strict rejects it (which we want). Keep labels plain text.
  • UI-layout mockups (dashboard tiles, hardware panels, screen layouts) — Mermaid is graph-DSL, not layout-DSL. Use wireframe for that, see below.

UI-layout mockups: wireframe

When you'd otherwise reach for ASCII boxes-and-pipes to mock a UI layout — dashboard tiles, hardware-UI panels, a login-screen sketch, a hand-drawn-feel app surface — stop. Use the wireframe field in a form instead. mermaid covers diagrams (graphs, flows, state); wireframe covers fixed-position panel grids.

Spec: {kind: "wireframe", panels: [{title?, content?, col_span?, row_span?, tone?}], columns?, gap?, label?, max_height?}.

panels is the only required field. Each panel has:

  • title — uppercase header, optional
  • content — multi-line monospace body, escape \n for line-breaks
  • col_span / row_span — default 1
  • tone"default" (neutral), "muted" (de-emphasised), "highlight" (accent border + tinted background)

aiui renders real CSS-Grid panels with proper borders, monospace content, and theme-matched colours — the layout actually looks like the layout, instead of approximating it with +s and |s.

{
  "kind": "wireframe",
  "label": "U-Boot-Funkbude",
  "columns": 3,
  "panels": [
    {"title": "EMPFANG", "col_span": 2, "content": "14:32:07 [SCHWACH] …\n14:32:11 [STARK]   WX MIDWAY"},
    {"title": "STATUS",  "col_span": 1, "content": "Tiefe: 18 m\nKurs:  270°\nSpeed: 8 kn"},
    {"title": "SIGNAL",  "col_span": 1, "content": "Atmo:    █▒▒\nWetter:  ▓▒▒", "tone": "muted"},
    {"title": "AKTION",  "col_span": 2, "content": "[T]auchen [A]uftauchen [K]urs", "tone": "highlight"}
  ]
}

Read-only — sits between input fields to give layout context, like markdown / image / mermaid. Result-handling unchanged.

Anti-patterns:

  • ASCII boxes-and-pipes for anything layout-shaped — that's the slop this field replaces.
  • Using wireframe for graphs / flows / sequences — those are mermaid-shaped, the panels here have no edges.
  • Putting deeply nested layouts inside one panel's content — the field is for the outer layout. If you need nested boxes, split into more panels with col_span / row_span.
  • Free-form HTML / markdown inside content — content is plain text, rendered monospace; everything else is intentionally ignored.

Inline-context fields: markdown, image, static_text

These don't ask anything — they sit between input fields to give context for the inputs that follow.

  • markdown — rendered Markdown block (lists, code, links, tables). Use for "here's the diff I generated, now decide" patterns. Not a standalone display tool — if you'd be tempted to open a window just to show the user a markdown blob, render it in chat instead.
  • image — read-only single image preview. src accepts a data: URL or any http(s):// URL — see Image sources below. Optional label, alt, max_height. Use when the agent generated a chart, screenshot, or diagram and needs visual sign-off before the next decision.
  • static_text — plain styled note with tone: "info"|"warn"|"muted". Lighter weight than markdown when no formatting is needed.

Visual pickers: image_grid

For "pick one (or more) of these N generated images" — logo variants, thumbnail candidates, asset triage. Spec: images: [{value, src, label?}], multi_select?, columns? (default 3). Result: {selected: [values]}. Each src follows the same rules as image — see below.

image_grid is a picker — one (or N) selected out of many. When you instead need a separate verdict per item — approve this, revise that, skip the third, with an optional note each — use the gallery tool below.

Batch review: gallery

A standalone tool (not a form field), for reviewing a batch of images and/or videos and collecting one decision per item in a single window — instead of firing confirm once per asset.

Spec: items: [{value, src?, label?, detail?, max_height?}], actions? (per-item buttons, default Approve / Revise / Skip), comment? (free-text field per item), columns? (default responsive). Each item's value must be non-empty and unique — it keys the result. src follows the same resolution rules as image; videos (a data:video/ URL, an http(s):// URL, or a local .mp4/.mov/.m4v/ .webm path) render with native <video controls>. Local video files of any size work: the bridge pushes them to aiui's media cache on the Mac and the dialog streams them back (range-seekable), so a remote agent's clip plays without you hosting it anywhere. http(s):// video URLs stream directly.

Result: {cancelled, decisions: {"<item value>": {decision, comment?}}}. Only items the user actually touched appear in decisions — an untouched item means "no verdict", not a default.

Use gallery for "review these 6 hero renders", "triage this screenshot batch". Use confirm+image for a single yes/no sign-off, and ask+thumbnail / image_grid when the task is picking among candidates rather than judging each one.

Starting window size: size / width / height

form and gallery accept an optional size hint — "s", "m", or "l" — and aiui picks good local defaults for each, clamped to the user's screen. (Power users can pass explicit width / height in logical px, which override size; rarely needed.)

The hint is a floor, not a cap: the window opens at max(content-estimate, hint). So a content-heavy dialog never opens smaller than it needs (you can't cram a 12-image gallery with size:"s"), but a sparse dialog you know will feel cramped at the default can be told to start roomy. Windows are always resizable regardless — but many users don't realise that, so a dialog that opens at a comfortable size is the difference between "looks polished" and "looks broken". Reach for "m" or "l" when a form carries images, tables, wireframes, or many fields, or a gallery has a large batch / tall thumbnails. Leave it unset for ordinary short forms — the auto-estimate already fits those.

Image sources (src / thumbnail)

aiui takes an image source in five places:

  • confirmimage: {src, alt?, max_height?} — visual yes/no
  • askoptions[].thumbnail — visual pick-one-of-N
  • formimage field → src
  • formimage_gridimages[].src
  • formlistitems[].thumbnail

In all of them the same three input formats render correctly:

  • Local filesystem path (/Users/me/foo.png, ~/Pictures/x.jpg) — the natural choice when the file is already on disk. The aiui bridge running on your host reads the file and inlines it as a data: URL before the dialog spec leaves your host. Important: the path must exist on the host you, the agent, are running on — for an SSH-tunneled session that's the remote, not the user's Mac. Absolute or ~/-rooted paths only — relative paths are not resolved (no stable cwd contract on MCP bridges). 10 MB cap.
  • http(s):// URL — aiui fetches it on the user's Mac and inlines it. 5-second timeout, 10 MB cap, parallel fetch for grids. Use when the image already lives on a reachable web server. The Mac contacts the URL, not aiui's infrastructure (aiui itself never phones home).
  • data: URLdata:image/png;base64,…. The fallback when neither path nor URL works (e.g. you generated bytes in-memory and don't want to write a tempfile). Embed the encoded bytes directly in the tool-call's src value — never roundtrip through a shell pipeline (see anti-pattern below). Watch the size — over ~2 MB it starts to feel laggy in the MCP transport.

Pick the simplest one that works: path first if the file's on disk, then URL if it's reachable, data: only as last resort.

What does not work — known footguns:

  • Relative paths (./foo.png, foo.png, ../assets/x.png). Resolved against an undefined cwd. Use absolute or ~/ paths.
  • Cross-host paths. A path that exists on the user's Mac but not on the remote where the agent runs (or vice versa) won't resolve — the bridge that does the reading is on the agent's host. If you need to render a Mac-side file from a remote agent, use http(s):// or pass the bytes inline as data:.
  • Bare URLs in markdown field text. Markdown's ![alt](url) follows the same CSP — the URL has to resolve to data: somehow. The resolver only walks src / thumbnail properties, not the bodies of markdown blocks.
  • Linking out with <a href="https://..."> from markdown — works as a click target, but opens in the user's default browser (we explicitly intercept it). It's not an image-rendering question.

If you tried a path or URL and the user reports a broken image, ask them once whether anything appeared at all — a missing file, a CSP block, and a 404 all look identical to the user. The companion logs the failure (imageresolve: …) but agents can't read those logs.

Anti-pattern: shell-encoding data: URLs

Don't write the encoded bytes to a tempfile, then cat or printf them back through bash to construct the JSON tool call. Two failure modes seen in the wild:

  1. The terminal recognises the data:image/... prefix in stdout and tries to render it inline — eats the rest of the pipeline.
  2. The encoded payload spans multiple shell-line buffers and gets word-split or quoting-mangled.

The fix is structural: the tool call is JSON, not shell. Either build the spec dict in your runtime and pass src=f"data:image/png;base64,{b64}" straight into the tool call, or hand aiui the path and let the bridge do the encoding for you.

datetime field

Lückenfüller between date and date_range. Cron, scheduling, reminders — one field instead of splitting into two text fields with manual validation. Native <input type="datetime-local">, returns ISO YYYY-MM-DDTHH:MM.

Tabs — long forms without scroll fatigue

Drop fields=… and pass tabs=[{label, fields: [...]}, ...] instead. One submit covers all tabs; validation jumps to the first invalid tab automatically. Tabs are display structure, not a wizard — no per-tab confirmation, no per-tab actions, all values land in one response.

Use when a single dialog naturally falls into 2-4 distinct topical groups (e.g. "Identity / Permissions / Notifications" on a user-create form). Don't reach for tabs to cram a 30-field form into 5 tabs — split into multiple form calls instead.

Password fields

For short-lived secrets (one-off API tokens, test passwords), prefer form with a password field over asking in chat: the value is masked on screen while the user types, so it doesn't appear in screen recordings or to a shoulder-surfer.

Be honest with the user, though — the value still returns to you as plaintext in the tool response. For long-lived or high-value secrets, use the secret field with a target instead (below) so the value never enters the conversation.

Secrets & file-write: the secret field + target (#135)

When a value must NOT pass through this conversation — a credential the user pastes that should land in a file, not your transcript — use a secret field with a target. Any input field may carry target; for a secret field the value is write-only: aiui writes it to the file and returns only {written, target, bytes}, never the value.

{ "kind": "secret", "name": "pat", "label": "GitHub PAT für byte5ai",
  "target": { "mode": "create", "path": "~/.github_tokens/byte5ai",
              "perm": "0600", "overwrite": true } }
  • mode: "create" — write the raw value. Needs overwrite: true to replace an existing file (a path typo otherwise fails loudly rather than clobbering).
  • mode: "substitute" — replace a placeholder that occurs exactly once in an existing file (format-agnostic: YAML/TOML/INI/env). 0 or >1 matches → error, never a partial or wrong write. Pick a distinctive sentinel that cannot collide with real file content — e.g. __AIUI_SECRET_GITHUB_PAT__, never a common word like TOKEN or X. The exactly-once rule is the safety net (a colliding placeholder errors instead of being misapplied), but a distinctive sentinel makes the match unambiguous in the first place.
  • Destination is always your own host — an aiui module already runs there (the native app for a local session, the bridge on a remote SSH session), and it performs the write as a plain local file operation. So create and substitute behave identically local and remote (the entered value reaches that module over aiui's own channel, never via the agent). You cannot target a foreign host; the user sees the resolved path and approves it by submitting.
  • Errors come back as {written:false, error} — no silent success.

Why it exists: it replaces the fragile "guess a shell one-liner to stash a token" pattern with a native dialog + a correct, atomic write whose target the user sees first. It's a QoL + confused-deputy guard, not a hard guarantee the agent can't read the value some other way — for that, the user still types it themselves outside any agent path.

Anti-patterns (slop vs. clean)

Slop Clean
confirm(title="Are you sure?") confirm(title="Drop table 'orders'?", destructive=True, message="18,432 rows will be removed.")
ask(question="Choose one", options=[{"label": "Option 1"}, …]) ask(question="Which migration strategy?", options=[{"label":"In-place","description":"Fast, no rollback."}, …])
form with 15 text fields Split into logical steps, or push back to chat entirely
Button labels "OK" / "Cancel" "Deploy" / "Discard" — name what happens
static_text echoing the title static_text adds context the labels can't carry alone
image(src="./shot.png") (relative path — undefined cwd) image(src="/Users/me/shot.png") — absolute, the bridge reads it locally
Writing base64 to a tempfile, cat-ing it through bash to build the tool call Pass the path as src and let the bridge encode, or build the data: URL directly in your runtime — never via shell pipes

Quick-reference example

aiui.form(
    title="New feature draft",
    header="Discovery",
    fields=[
        {"kind": "text", "name": "job", "label": "User job",
         "multiline": True, "required": True},
        {"kind": "select", "name": "scope", "label": "Scope",
         "options": [{"label": "Quick win", "value": "qw"},
                     {"label": "Feature", "value": "f"},
                     {"label": "Epic", "value": "e"}],
         "default": "f"},
        {"kind": "list", "name": "stakeholders", "label": "Stakeholders",
         "items": [{"label": "Product", "value": "prod"},
                   {"label": "Design", "value": "design"},
                   {"label": "Engineering", "value": "eng"}],
         "selectable": True, "multi_select": True,
         "default_selected": ["prod", "eng"]},
        {"kind": "date", "name": "deadline", "label": "Target date"},
    ],
    actions=[
        {"label": "Cancel", "value": "cancel", "skip_validation": True},
        {"label": "Save draft", "value": "draft", "skip_validation": True},
        {"label": "Create", "value": "commit", "primary": True},
    ],
)

Response: {cancelled: false, action: "commit", values: {job: "…", scope: "f", stakeholders: {selected: [...], order: [...]}, deadline: "…"}}.