Streamline TUI project opening#1
Closed
Zorlin wants to merge 2797 commits into
Closed
Conversation
## Summary I frequently want to be able to paste into the searchable menu -- the most common use-case here is when specifying an upstream for a `/review`, where I copy the upstream from an open terminal.
## Why `codex app [PATH]` is the documented CLI entry point for opening Codex Desktop on a workspace. Recent desktop builds can focus the app while failing to honor paths passed as macOS document-open arguments via `open -a Codex.app <workspace>`, which broke `codex app .` for users. See openai#25333; related report: openai#25166. The desktop app still supports the explicit `codex://threads/new?path=...` route, so the CLI should use that app-owned launch surface instead of depending on folder-open event delivery. ## What Changed - Build a `codex://threads/new?path=<workspace>` URL in the macOS app launcher. - Pass that URL to `open -a <Codex.app>` instead of passing the workspace path as a document argument. - Add coverage that workspace paths needing escaping round-trip through URL query encoding. ## Verification - `just test -p codex-cli codex_new_thread_url_encodes_workspace_path`
## Summary Fixes openai#25295. The slash-command popup reused its previous `ScrollState` when the composer filter token changed. After scrolling the full `/` command list, typing a narrower filter such as `/st` could clamp the stale selection into the filtered results and highlight the wrong command. This resets the popup selection and viewport only when the parsed filter token changes, so normal arrow navigation is preserved while new filters start at the first match.
Closes openai#24886. ## Why Users can configure the TUI status line and terminal title with `model-with-reasoning`, but issue openai#24886 asks for a compact reasoning-only item. That lets a setup show just `default`, `low`, `medium`, `high`, or `xhigh` without repeating the model name. ## What changed - Added a `reasoning` item for `/statusline` and `/title` setup flows. - Rendered the item from the effective reasoning effort, including collaboration-mode overrides. - Registered `reasoning` with `codex doctor` so Codex-generated terminal-title config is not reported as invalid. - Updated TUI setup snapshots so the picker previews include the new item.
## Summary - preserve existing explicit SQLite thread titles during rollout reconciliation/backfill when the incoming rollout title is only first-message-derived - keep stale inferred-title repair behavior while avoiding session-index scans during startup backfill - add a regression test for renamed titles surviving reconcile ## Testing - just fmt - just test -p codex-rollout - just test -p codex-state
## Rollout compression stack This stack splits openai#24941 into reviewable steps for local rollout compression. The design is intentionally staged: 1. Teach readers, listing, search, and lookup to understand compressed rollouts. 2. Make append and resume paths materialize compressed rollouts back to plain JSONL before writing. 3. Add a disabled-by-default worker that can compress cold archived rollouts behind `local_thread_store_compression`. The key invariant is that writers append to plain `.jsonl`. A `.jsonl.zst` file is a cold/read representation; if a write is needed, the compressed file is materialized back to plain JSONL first. Readers prefer plain `.jsonl` when both forms exist and can fall back to the compressed sibling during transitions. The worker is deliberately the last PR and remains behind an under-development feature flag. It currently scans only `archived_sessions`, not active `sessions`, because active sessions have the highest resume/append race risk. That means this stack does not yet compress most unarchived local history. ## Known race / follow-up The remaining unresolved design question is writer/compressor coordination. Even for archived rollouts, a resume or metadata update can append while the worker is replacing the plain file with `.jsonl.zst`; the current double-stat checks narrow but do not fully eliminate the window where a writer has opened the plain file before unlink. Do not treat the worker PR as production-ready until we either: - prevent append/resume paths from racing archived compression, or - introduce a shared representation/append lock or equivalent coordination. The first two PRs are useful independently: they make compressed rollouts readable and make append paths safely recover back to plain JSONL. The third PR isolates the worker behavior so that coordination issue is reviewable separately. ## Validation Focused local validation for the stack includes: - `just test -p codex-rollout` - `just test -p codex-thread-store` where thread-store paths were touched - `just test -p codex-features` for the feature flag slice - `just bazel-lock-check` after dependency graph changes - scoped `just fix -p ...` passes for changed crates CI is still the source of truth for the full platform matrix. ## This PR in the stack This is PR 3/3, based on openai#25088. It adds the under-development feature flag and starts the best-effort background worker when enabled. The worker currently compresses only cold archived rollouts, skips active sessions, verifies compressed output, preserves mtime and permissions, keeps a store-level lock heartbeat, and cleans stale temp files. Stack order: 1. openai#25087: read compressed local rollouts. 2. openai#25088: materialize compressed rollouts before append. 3. This PR: add the disabled local compression worker.
## Why Codex 0.135.0 started shipping bundled SQLite 3.51.x via SQLx 0.9.0 to avoid the older WAL corruption bug fixed by openai#24728. On Windows x64, openai#25367 reports an immediate `STATUS_ILLEGAL_INSTRUCTION` crash on a Haswell CPU when starting normal Codex paths. Rather than downgrading SQLite, this keeps the newer bundled SQLite source and removes SQLite compiler-intrinsic code paths from the Windows x64 release build. ## What changed For `x86_64-pc-windows-msvc` release builds, export `LIBSQLITE3_FLAGS=SQLITE_DISABLE_INTRINSIC` before `cargo build` in: - `.github/workflows/rust-release.yml` - `.github/workflows/rust-release-windows.yml` Other targets keep their current SQLite build flags. ## Verification - `git diff --check`
## Summary - Configure the rust-release build job with `CARGO_NET_GIT_FETCH_WITH_CLI=true` - Document the macOS SecureTransport/libgit2 failure mode that hit the `libwebrtc`/`libyuv` git submodule fetch ## Root cause The release run at https://github.com/openai/codex/actions/runs/26717498860/job/78745156683 repeatedly failed before compilation because Cargo's libgit2 fetch path could not clone the nested `yuv-sys/libyuv` submodule from `chromium.googlesource.com`, ending with `SecureTransport error: connection closed via error`. ## Validation - `git diff --check` This is a workflow-only change, so I did not run Rust package tests.
## Why [openai#25089](openai#25089) added the background worker for compressing cold archived rollouts, but the worker still processed files effectively one at a time: each compression job was sent to `spawn_blocking` and then awaited before the next file started. On machines with a backlog of archived rollouts, that makes catch-up slower than it needs to be even though the actual compression work already runs off the async runtime. ## What Changed - Queue rollout compression work in a `JoinSet` while directory traversal continues. - Cap the worker at two in-flight compression jobs so it can overlap compression without turning the background task into unbounded blocking work. - Drain pending jobs before returning, including the `read_dir.next_entry()` error path, so every launched job still contributes to the final `compressed`, `skipped`, and `failed` stats. - Treat task join failures the same way as compression failures in the worker's warning and failure accounting.
## Summary - add public `codex_exec_server::EnvironmentPathRef` - bind an absolute path to its owning executor filesystem - keep path operations in the next review slice ## Stack - 1/5 in the skills path authority stack extracted from openai#25098 ## Validation - `cd /Users/starr/code/codex-worktrees/pr-25098-restack4/codex-rs && just fmt` - GitHub CI pending on rewritten head
) ## Summary Renames the MultiAgentV2 turn-triggering tool from `assign_task` to `followup_task` so the exposed tool name better describes sending an additional task to an existing agent. This updates the tool spec, handler/module names, registry wiring, default multi-agent v2 usage hints, and tests. Rollout trace classification keeps accepting legacy `assign_task` events so older traces still reduce correctly, while docs show the new tool name. ## Test plan - `just test -p codex-core followup_task` - `just test -p codex-core -E 'test(multi_agent_feature_selects_one_agent_tool_family) | test(multi_agent_v2_can_use_configured_tool_namespace) | test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'` - `just test -p codex-rollout-trace` - `just fix -p codex-core` - `just fix -p codex-rollout-trace` Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase because the local environment could not resolve `hatchling>=1.27.0` from the configured internal registry. A full `just test -p codex-core` also hit unrelated environment-sensitive integration failures involving missing spawned test binaries/sandbox behavior; the changed multi-agent spec/handler tests passed in the filtered runs above.
## Summary - Preserve app declaration order when loading plugin .app.json files. - Keep plugin connector summaries in plugin app order after connector metadata is merged and filtered. - Add regression coverage for .app.json order and connector summary order. ## Validation - just fmt - just test -p codex-chatgpt connectors_for_plugin_apps_returns_only_requested_plugin_apps - just test -p codex-core-plugins effective_apps_preserves_app_config_order - just fix -p codex-core-plugins (passes with existing clippy large_enum_variant warning in core-plugins/src/manifest.rs) - just fix -p codex-chatgpt - just bazel-lock-update - just bazel-lock-check
## Summary
Make the root `justfile` usable from Windows without maintaining a
separate Windows copy of most recipes.
The repo recipes previously assumed POSIX shell behavior for things like
variadic argument forwarding (`"$@"`) and stderr redirection
(`2>/dev/null`). That made common workflows such as `just fmt`, `just
test`, and `just log` unreliable from Windows. This PR introduces a
small cross-platform shell adapter so recipes can stay mostly unified
while still expanding the few shell-specific constructs correctly on
macOS/Linux and Windows.
## What Changed
- Add `scripts/just-shell.py` as the configured `just` shell adapter.
- On Unix it invokes `sh -cu`.
- On Windows it invokes `pwsh -CommandWithArgs` so arguments containing
spaces are preserved.
- Add portable recipe placeholders:
- `{args}` expands to `"$@"` on Unix and the equivalent PowerShell
forwarded-args expression on Windows.
- `{stderr-null}` expands to the platform-specific stderr suppression
used by `fmt`.
- Convert most variadic one-line recipes to the unified `{args}` form,
including `codex`, `exec`, `file-search`, `app-server-test-client`,
`fix`, `clippy`, `bench`, `mcp-server-run`, `write-app-server-schema`,
and `argument-comment-lint-from-source`.
- Keep genuinely shell-specific recipes split or Unix-only for now,
including recipes backed by `.sh` scripts or recipes whose bodies are
more than simple command forwarding.
- Add a Windows `just install` path that installs PowerShell via
`winget` when `pwsh` is not available, then runs the same basic Rust
setup steps.
- Update the SDK test that validates the root `fmt` recipe so it
recognizes the new portable stderr placeholder.
## Validation
- `just --summary`
- `just --dry-run fmt`
- `just --dry-run bench-smoke`
- `just --dry-run codex foo "bar binky" baz`
- `just --dry-run write-hooks-schema`
- `just --dry-run bazel-lock-update`
- `just --dry-run argument-comment-lint-from-source -- "foo bar"`
- `git diff --check -- justfile scripts/just-shell.py
sdk/python/tests/test_artifact_workflow_and_binaries.py`
- Verified Windows argv preservation through `scripts/just-shell.py`
with arguments containing spaces.
- `uv run --frozen --project sdk/python --extra dev pytest
sdk/python/tests/test_artifact_workflow_and_binaries.py::test_root_fmt_recipe_formats_rust_and_python_sdk`
## Why `codex_core` is consistently a bottleneck for incremental builds during iteration. The simplest fix is to make the crate smaller. ## Summary `codex-core` owns several reusable prompt renderers and static prompt assets, which makes the crate harder to split apart. Rename `codex-review-prompts` to `codex-prompts` and move shared review, goal, permissions, compaction, realtime, hierarchical AGENTS.md, and `apply_patch` prompts into it. Move prompt-only tests and update consumers and `CODEOWNERS`. ## Validation - `just test -p codex-prompts -p codex-apply-patch` - `just test -p codex-core prompt_caching` - Bazel builds for the affected crates
## Why [openai#25089](openai#25089) introduced the background worker that compresses cold archived rollouts, and [openai#25654](openai#25654) made that pass faster once it starts. But the worker still deleted `rollout-compression.lock` on successful exit, so the existing six-hour staleness window only helped with overlapping or crashed workers. Each new local thread-store initialization could immediately rescan archived rollouts even if a full pass had just finished. This change keeps the existing marker around long enough to throttle redundant reruns. The worker is still best-effort, but it no longer does repeated startup scans when nothing new is eligible for compression. ## What Changed - Replace the drop-scoped `CompressionLock` with a `CompressionRunMarker` that claims the existing `.tmp/rollout-compression.lock` path and leaves it in place after success. - Reuse the existing six-hour staleness window to block both overlapping starts and immediate reruns, while still letting a stale marker be reclaimed. - Update the worker docs and debug logging to describe the new "already running or recently ran" behavior. - Extend the rollout compression tests to assert that a successful run leaves the marker behind and that a fresh marker suppresses a new run. ## Validation - `just test -p codex-rollout`
## Why Python files under `scripts/` were not covered by the repository formatting recipe or the CI formatting job, so formatting drift could merge unnoticed. ## What - Add a dedicated `scripts/pyproject.toml` and `scripts/uv.lock` so root-script formatting uses a locked Ruff version. - Extend `just fmt` to format root Python scripts and add `fmt-scripts-check` for CI. - Run `just fmt-scripts-check` from `.github/workflows/ci.yml`, installing `uv` through SHA-pinned `astral-sh/setup-uv` while retaining the `uv` `0.11.3` pin. - Apply Ruff formatting to the root Python scripts, including `scripts/just-shell.py`, and extend `sdk/python/tests/test_artifact_workflow_and_binaries.py` to cover the root formatting recipe. - Update `AGENTS.md` so agents run `just fmt` after code changes anywhere in the repository. ## Validation - Extended the existing Python SDK workflow test to assert that `just fmt` includes root Python scripts.
## Why Guardian auto-review normally uses the provider-preferred review model when one is available. Some parent models need model-catalog metadata to select a different review model while keeping older `/models` payloads compatible when that metadata is absent. ## What changed - Added optional `ModelInfo::auto_review_model_override` metadata to the public model payload as a review-model slug. - Updated Guardian review model selection to prefer the catalog override when present, while preserving the existing provider preferred-model path and parent-model fallback when it is omitted. - Added focused Guardian coverage for override and no-override model selection. - Added an `auto_review` core integration suite test that loads override metadata from a remote model catalog path and asserts the strict auto-review `/responses` request uses the catalog-selected review model. - Updated existing `ModelInfo` fixtures and local catalog constructors for the new optional field. ## Validation - `cargo test -p codex-protocol model_info_defaults_availability_nux_to_none_when_omitted` - `cargo test -p codex-core guardian_review_uses_` - `cargo test -p codex-core remote_model_override_uses_catalog_model_for_strict_auto_review --test all` - `just fix -p codex-protocol` - `just fix -p codex-core` - `just fmt` - `git diff --check`
## Summary - add executor filesystem canonicalization as a bound-path operation - route remote canonicalization through the exec-server filesystem RPC surface - keep path normalization attached to the filesystem that owns the path ## Stack - 2/5 in the skills path authority stack extracted from openai#25098 - follows merged openai#25121 ## Validation - `cd /Users/starr/code/codex-worktrees/pr-25098-restack-review-pr1b/codex-rs && just fmt` - Not run: tests/checks (not requested) - GitHub CI pending on rewritten head
Fixes this flake: https://github.com/openai/codex/actions/runs/26773809591/job/78919970410?pr=25659 This test is about zsh-fork subcommand approval behavior, not workspace sandboxing, so it now runs with `DangerFullAccess` to avoid macOS sandbox setup failures before the second subcommand approval.
## Why `shell_zsh_fork` and unified exec need to remain independently controllable for enterprise rollouts, but we also need a third mode that composes them. That composed mode is intended to preserve unified exec command lifecycle support while letting the zsh fork provide more accurate `execv(2)` interception. Enabling `unified_exec_zsh_fork` by itself is intentionally not sufficient. It is a composition gate, not a dependency-enabling shortcut: - `unified_exec` selects the PTY-backed unified exec tool. - `shell_zsh_fork` opts into the zsh fork backend. - `unified_exec_zsh_fork` only allows those two already-enabled modes to be composed so local zsh unified exec commands can launch through the zsh fork. This separation is deliberate. Enterprises and staged rollouts must be able to enable or disable unified exec and zsh-fork independently. If `unified_exec_zsh_fork` implied either dependency, then enabling one under-development composition flag would silently activate a shell backend that the configured feature set left disabled. This PR introduces only the configuration and planning gate for that composition. Existing `shell_zsh_fork` behavior continues to use the standalone shell tool unless the new composition feature is explicitly enabled alongside both dependencies. ## What Changed - Added the under-development feature flag `unified_exec_zsh_fork`. - Added `UnifiedExecFeatureMode` so the three input feature flags collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool planning. - Updated tool selection so zsh-fork composition requires `unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`. - Kept the existing standalone zsh-fork shell tool behavior when only `shell_zsh_fork` is enabled. - Updated config schema output for the new feature flag. ## Verification - Added feature and tool-config coverage for the new gate. - Added planner coverage proving `shell_zsh_fork` remains standalone until composition is explicitly enabled. - Ran focused tests for `codex-features`, `codex-tools`, and the affected `codex-core` planner case. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979). * openai#24982 * openai#24981 * openai#24980 * __->__ openai#24979
## Why When unified exec is configured to launch through the zsh fork, local commands should not let the model override the shell binary with the `shell` parameter. The configured zsh fork is the mechanism that makes `execv(2)` interception reliable, so exposing `shell` for local zsh-fork execution would create a confusing API surface and undermine the composition. Remote environments are different: zsh-fork interception is local-only, so remote unified-exec calls must keep direct unified-exec behavior and still expose `shell` when a remote environment can be selected. ## What Changed - Taught the `exec_command` schema builder to omit the `shell` parameter when requested. - Hid `shell` from the unified-exec tool schema only when zsh-fork unified exec applies to all selectable environments. - Kept `shell` visible when any remote environment can be targeted, because those calls run through direct unified exec. - Made unified exec choose the effective shell mode per selected environment: local environments keep zsh-fork mode, remote environments use direct mode. - Left direct unified-exec behavior unchanged, including support for model-specified shells there. ## Verification - Added schema coverage showing `exec_command` can hide `shell`. - Added planner coverage showing zsh-fork unified exec hides `shell` for local-only execution while direct unified exec still exposes it. - Added planner coverage showing `shell` remains visible when a remote environment is available. - Added handler coverage showing remote environments use direct unified-exec shell mode instead of zsh-fork mode. - Ran the focused `codex-core` shell-parameter and zsh-fork tests. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24980). * openai#24982 * openai#24981 * __->__ openai#24980
## Summary Add counter telemetry for the local rollout compression worker so we can see when it runs, why it skips, and how individual file/materialization paths resolve. ## Changes - Emit `codex.rollout_compression.run` with statuses for start, completion, failure, duplicate-run skip, and missing runtime skip. - Emit `codex.rollout_compression.file` outcomes for scanned, compressed, skipped, and failed compression candidates. - Emit `codex.rollout_compression.temp_cleanup` and `codex.rollout_compression.materialize` counters for cleanup and decompression paths. ## Validation - `just fmt` - `just test -p codex-rollout` - `just fix -p codex-rollout`
## Why
New unit test modules should follow one consistent layout so
implementation files stay focused and test suites remain easy to locate,
without creating cleanup churn in existing inline test modules.
## What changed
- Added `AGENTS.md` guidance requiring new test modules to use separate
sibling `*_tests.rs` files with an explicit `#[path = "..._tests.rs"]`
attribute.
- Clarified that existing inline `#[cfg(test)] mod tests { ... }`
modules should not be moved solely to follow the new convention.
## Validation
- Ran `git diff --check`.
## Summary Stacked on openai#25679. Add histogram telemetry for rollout compression runtime, per-file compression time, byte sizes, and compression ratio. ## Changes - Emit `codex.rollout_compression.run.duration_ms` tagged by final run status. - Emit `codex.rollout_compression.file.duration_ms` tagged by file outcome. - Emit source and compressed byte histograms for compression candidates/results. - Emit `codex.rollout_compression.file.compression_ratio` for successful compressions, recorded as integer basis points. ## Validation - `just fmt` - `just test -p codex-rollout` - `just fix -p codex-rollout`
## Summary - describe omitted code-mode tools as deferred nested tools instead of MCP/app tools - update the prompt-description assertion to match ## Why Deferred dynamic tools are also callable through `tools` and discoverable in `ALL_TOOLS`, so the previous MCP/app-specific wording was too narrow. ## Validation - `just fmt` - `just test -p codex-code-mode` - `git diff --check`
## Why Python contributions in this repository should target the declared Python 3 runtime instead of carrying Python 2 compatibility patterns forward. When compatibility across Python 3 point releases matters, contributors need a consistent source of truth for the minimum supported version. ## What changed - Added Python development guidance to `AGENTS.md` stating that the repository uses Python 3+ and should not use the `__future__` module. - Documented that contributors should check the nearest `pyproject.toml` `requires-python` field when evaluating Python 3 point-release compatibility. ## Testing Not run (guidance-only change).
…5681) ## Summary - Deduplicate installed `openai-curated` and `openai-curated-remote` plugin conflicts by feature flag. - Prefer remote when remote plugins are enabled; otherwise prefer local, while preserving one-sided installs. ## Testing - `just fmt` - `git diff --check` - Targeted `just test` was blocked locally because `cargo-nextest` is not installed.
This PR brought to you via VS Code rather than Codex... - opened `codex-rs/app-server/tests/common/mcp_process.rs` - put the cursor on `McpServer` - hit `F2` and renamed the symbol to `TestAppServer` - went to the file tree - hit enter and renamed `mcp_process.rs` to `test_app_server.rs` - ran **Save All Files** from the Command Palette - ran `just fmt` The End (Admittedly, most of the local variables for `TestAppServer` are still named `mcp`, though.)
## Summary - tighten the default multi-agent v2 root and subagent usage hints to bias toward local work - add a pre-call gate to the v2 spawn_agent description for independent, bounded, parallelizable subtasks ## Validation - just fmt - started just test -p codex-core, but it was interrupted before completion per follow-up request to commit and push immediately
## Why Thread cwd and environment selections are a single logical setting in core: updating one without the other can silently desynchronize the next-turn execution context. This change makes that relationship explicit in the internal thread settings flow while preserving the existing app-server public API shape. ## What changed - Moved the cwd/environment pair through internal `ThreadSettingsOverrides.environment_settings` instead of a top-level internal `cwd` field. - Kept `thread/settings/update` public params unchanged, with app-server translating top-level `cwd` into the paired internal settings shape. - Moved `Op::UserInput` environment overrides into thread settings so user turns and settings updates use the same core path. - Updated core, app-server, MCP, memories, sample, and test callsites to construct the paired settings shape. ## Verification - `git diff --check` - Local test run starting after PR creation.
## Why `codex sandbox --permissions-profile` is useful when running commands under a named permissions profile, but the long option is cumbersome for a debugging-oriented command. `-p` is already used for the config profile selector, so `-P` gives the permissions profile selector a compact, non-conflicting alias. ## What Changed - Added `short = 'P'` to the `permissions_profile` option for the macOS, Linux, and Windows sandbox command structs in [`codex-rs/cli/src/lib.rs`](https://github.com/openai/codex/blob/6d9f9c5cdcaa0a156aa2dabbde259ae5e9e8bc0b/codex-rs/cli/src/lib.rs#L29-L112). - Added parser coverage for `codex sandbox -P :workspace -- echo` in [`codex-rs/cli/src/main.rs`](https://github.com/openai/codex/blob/6d9f9c5cdcaa0a156aa2dabbde259ae5e9e8bc0b/codex-rs/cli/src/main.rs#L2883-L2896). ## Verification - `just test -p codex-cli` passed, including the new `sandbox_parses_permissions_profile_short_alias` parser test.
## Why `codex sandbox` can start a network proxy from a configured permission profile. Previously, sandbox-level containment was tied to managed network requirements rather than whether a proxy was actually active. This meant config-driven proxy policies were not consistently enforced as the sandbox's only network path. ## What changed - Enable proxy-only network containment whenever `codex sandbox` starts a network proxy. - Apply the same active-proxy check to the macOS and Linux sandbox paths. - Add a Linux regression test that verifies a sandboxed command cannot establish a direct connection while the configured proxy is active. ## Test plan - `just test -p codex-cli debug_sandbox::tests` - `sandbox_with_network_proxy_blocks_direct_loopback_access` runs on Linux to cover the config-driven proxy path end to end.
## Why Image edits should use the exact images selected by the model instead of inferring edit inputs from conversation history. ## What changed - Replaced the image tool's `action` argument with optional `referenced_image_paths`. - Treats omitted or empty references as generation and populated references as editing. - Reads referenced absolute image paths and packages them as image data URLs for the edit request. - Removed the previous history-selection and image-count heuristics. - Updated direct and code-mode tool instructions and calls. - Added an app-server integration test covering an attached image routed to the image edit endpoint. ## Validation - Tested end-to-end on local `just codex` with copy pasted image, attached image, etc. - `just test -p codex-image-generation-extension` - `just test -p codex-app-server standalone_image_edit_uses_attached_model_visible_image` - `just fix -p codex-image-generation-extension` - `just bazel-lock-check`
## Summary - stop emitting `codex_error_subreason` on `codex_turn_event` - remove the transient analytics fact plumbing that copied `CodexErr::InvalidRequest(String)` into the event - update analytics serialization coverage accordingly ## Why `codex_error_subreason` is a free-form copy of `InvalidRequest(String)`, including raw provider 400 bodies in some paths. That makes it unsafe as an analytics field because it can carry user-derived or sensitive text. ## Validation - `just fmt` - `just test -p codex-analytics`
## Summary - require the main agent to read selected `SKILL.md` files completely, continuing truncated or paginated reads through EOF - require the main agent to personally read task-required instruction references instead of delegating their interpretation - clarify that progressive disclosure selects relevant files without permitting partial reads - preserve subagent use for task work when the selected skill allows it - cover both absolute-path and aliased-root prompt variants ## Why Partial reads can skip routing and verification requirements later in skill instructions. Delegated summaries can also omit constraints the main agent needs to follow. The existing "Read only enough" wording made both behaviors appear acceptable. ## Impact Agents should follow complete selected skill instructions while continuing to avoid unrelated references, scripts, and assets. Subagents remain available for task execution where permitted. ## Test plan - `just test -p codex-core-skills` (101 passed) - `just fmt` - `git diff --check`
## Why
Some connector golden schemas use JSON Schema composition keywords
beyond `anyOf`, specifically top-level or nested `oneOf` and `allOf`.
Codex currently needs to preserve those shapes when parsing MCP tool
input schemas so connector tools do not lose valid schema structure
during normalization.
To prevent an increased Responses API error rate, this PR will be merged
after the Responses API supports top-level `oneOf`/`allOf`.
## What Changed
- Adds `oneOf` and `allOf` support to `JsonSchema`, matching the
existing `anyOf` handling.
- Traverses `oneOf` and `allOf` anywhere schema children are visited,
including sanitization, definition reachability, description stripping,
and deep schema compaction.
- Adds a final large-schema compaction pass that prunes schema objects
containing `anyOf`, `oneOf`, or `allOf` to `{}` if earlier compaction
passes still leave the schema over budget.
## Validation
Golden schema token validation over `2,025` schemas under
`golden_schemas`, all parsed successfully. Token count is `o200k_base`
over compact JSON from `parse_tool_input_schema`.
| Percentile | Before PR | After oneOf/allOf | After pruning |
|---|---:|---:|---:|
| p0 | 9 | 9 | 9 |
| p10 | 63 | 64 | 64 |
| p25 | 86 | 87 | 87 |
| p50 | 125 | 128 | 128 |
| p75 | 203 | 206 | 206 |
| p90 | 327 | 333 | 333 |
| p95 | 460 | 473 | 473 |
| p99 | 763 | 779 | 779 |
| max | 891 | 955 | 955 |
Totals:
| Parser state | Total tokens |
|---|---:|
| Before PR | 345,713 |
| After oneOf/allOf | 352,686 |
| After pruning | 352,686 |
The pruning column matches the oneOf/allOf column for this corpus
because no parsed compact golden schema remains over the `4,000`
compact-byte budget after the earlier compaction passes.
Curated plugin startup refresh now removes cached plugins whose names no longer appear in the raw openai-curated marketplace. This prevents users with the old standalone Google Sheets plugin selected locally from continuing to load its stale cache after the curated repo drops it. Existing config is left untouched, and plugins still present in the marketplace continue to refresh from local curated sources. Validation: - `just fmt` - `just test -p codex-core-plugins` - `git diff --check`
## Summary This changes the default remote plugin marketplace listing to use the cached global remote catalog when it is already present on disk. The foreground `plugin/list` response can then return from the local catalog cache instead of waiting on `/ps/plugins/list`. When a cached global catalog was present at the start of the request, `plugin/list` still schedules a background refresh through the existing plugin-list background task path so the disk cache is updated for future requests. Cache misses keep the existing synchronous remote fetch path and write the cache, and they do not schedule an extra duplicate background `/ps/plugins/list` refresh. Installed/enabled state continues to come from the existing remote installed overlay path. This change only affects the global remote catalog directory data used by `plugin/list`. ## Testing - `just fmt` - `just test -p codex-app-server plugin_list_uses_cached_global_remote_catalog_and_refreshes_it` - `just test -p codex-core-plugins` - `git diff --check`
## Why Metric descriptions should be declared with reusable OTEL instruments instead of being coupled to individual consumers. Counter descriptions are the smallest API primitive needed by the exec-server observability work. ## What changed - Adds `counter_with_description` while preserving the existing counter API. - Caches counters by name and description so instrument metadata remains part of the declaration identity. - Covers the exported description together with the existing value and attribute contract. This PR only adds counter descriptions. It does not add gauges, second-based durations, or exec-server adoption. ## Stack 1. **openai#26091: counter descriptions** 2. openai#27057: gauge instruments 3. openai#27058: second-based duration histograms Related independent coverage: openai#27059 tests OTLP HTTP log and trace event export. The `codex-exec-server` bounded service tag now stays with the exec-server adoption change instead of this reusable infrastructure stack. ## Validation - `just test -p codex-otel` - `just fix -p codex-otel` - `just fmt`
## Background This was prompted by [openai#26858](openai#26858), where the attached doctor report did not include the editor selection and I had to [ask which editor was in use](openai#26858 (comment)) before investigating the external-editor newline issue. Capturing these variables in doctor makes that context available up front in future reports. `codex doctor` is intended to capture enough local context to diagnose startup and terminal behavior, but it did not report the environment variables that select an external editor or configure command pagers. The TUI [prefers `VISUAL` over `EDITOR`](https://github.com/openai/codex/blob/56554904babcaacf4444a2cc90716880837dff7c/codex-rs/tui/src/external_editor.rs#L31-L38), so missing or unexpected values can explain why the external-editor shortcut fails or launches the wrong command. Pager values are also useful inherited-shell context even though [unified exec normalizes its effective pager variables to `cat`](https://github.com/openai/codex/blob/56554904babcaacf4444a2cc90716880837dff7c/codex-rs/core/src/unified_exec/process_manager.rs#L60-L70). These variables can contain arbitrary command arguments or inline environment assignments. The human report is local, but `codex doctor --json` may be attached to feedback, so the machine-readable report should not include their raw contents. ## What Changed - Report `VISUAL` and `EDITOR` in the system environment details, using `not set` when either variable is absent. - Report inherited `PAGER`, `GIT_PAGER`, `GH_PAGER`, and `LESS` values when present. - Preserve full values in local human output while reducing these fields to `set` or `not set` in redacted JSON output. - Add structured check, JSON-redaction, rendered-output, and snapshot coverage. ## How to Test 1. From `codex-rs`, run Codex with explicit editor and pager variables: ```sh env VISUAL='code --wait' EDITOR=vim PAGER='less -R' GIT_PAGER=delta GH_PAGER=less LESS=-FRX \ cargo run -p codex-cli --bin codex -- doctor --no-color ``` 2. Confirm the `system` details show the full values for all six variables. 3. Unset the pager variables and rerun the command. Confirm pager rows are omitted while missing editor variables are shown as `not set`. 4. Run the same configured environment with `doctor --json`. Confirm each configured editor or pager field is reported as `set` and none of the raw commands or arguments appear in the JSON. Targeted tests: - `just test -p codex-cli` (279 tests passed)
…penai#27084) ## Summary Some customer MCP tools expose large input schemas that exceed Codex's compact schema budget even after description stripping. Today, the final compaction pass collapses complex schemas starting at depth 2, which can erase important shallow call structure such as small `anyOf` branches, required fields, and help-mode entry points. In one reported case, this degraded a tool schema into `query: any | any`, leaving the model without enough structure to discover the required help call. This change raises the deep-schema collapse boundary from depth 2 to depth 3. That preserves one additional layer of the tool contract while still collapsing deeper expensive subtrees to `{}` when a schema remains over budget. ## What Changed - Increased `MAX_COMPACT_TOOL_SCHEMA_DEPTH` from `2` to `3`. - Updated the schema compaction traversal test to assert the new collapse boundary. - The resulting compacted shape keeps useful shallow structure, for example: - top-level argument names - shallow `anyOf` branches - required object fields - nested property names one level deeper than before ## Validation - Ran `just test -p codex-tools`: 81 tests passed. - Ran a golden schema corpus comparison over 214 discovered tool input schemas under `golden_schemas/*/mcp_tools/*/input_schema.json`. - Depth 2 and depth 3 had identical percentile token counts across the corpus. - Both ended with `0 / 214` schemas over 1k tokens. - Both ended with `0 / 214` schemas over the 4,000-byte compact JSON budget. - Only one golden schema changed, increasing from 49 to 56 tokens, so this does not appear to introduce a meaningful corpus-wide regression. Corpus percentile results: | Percentile | Depth 2 | Depth 3 | |---|---:|---:| | p0 | 9 | 9 | | p10 | 31 | 31 | | p25 | 54 | 54 | | p50 | 81 | 81 | | p75 | 143 | 143 | | p90 | 290 | 290 | | p95 | 431 | 431 | | p99 | 600 | 600 | | max | 832 | 832 |
## Why Codex needs stable `file:` URI identifiers that can cross process and operating-system boundaries without eagerly interpreting them as native paths. Existing fields also need to keep accepting absolute path strings during migration. ## What changed - Add `codex-utils-path-uri` with a validated, immutable `PathUri` wrapper that currently accepts only `file:` URLs. - Expose URI-level `basename`, `parent`, and `join` operations that preserve authorities and percent encoding without guessing the source operating system. - Keep native conversion explicit through `AbsolutePathBuf` and the current host rules. - Serialize as canonical URI text while accepting both URI text and legacy absolute native paths during deserialization. - Add adversarial coverage for Windows-looking and POSIX paths, UNC authorities, encoded metadata characters, non-UTF-8 POSIX paths, URI hierarchy operations, and legacy serde round trips.
## Background Bare URLs containing `~` in their path are currently only clickable up to the tilde in the interactive TUI. For example, Codex renders the visible text for: `https://www.cs.tufts.edu/~nr/cs257/archive/olin-shivers/dissertation.pdf` but the OSC 8 destination stops at `https://www.cs.tufts.edu/`. This makes Cmd-click open the wrong location even though the terminal recognizes the complete URL outside Codex. Fixes openai#26774. ## Root Cause The URL scanner already accepts `~`. The truncation happens earlier: with strikethrough parsing enabled, `pulldown-cmark` splits this URL into adjacent decoded `Event::Text` values around the tilde. The Markdown renderer annotated each text event independently, so only the first event still looked like a complete URL with a supported scheme. The renderer now merges adjacent decoded text events before URL annotation. It preserves the combined source range while retaining parser-decoded contents, which avoids regressing entities such as `&`. ## Changes - Add a small iterator that merges adjacent decoded Markdown text events and their source ranges. - Apply it at the Markdown renderer boundary before hyperlink detection. - Add regression coverage for the reported URL in prose, wrapped table output, and entity-decoded URLs. ## How to Test 1. Run Codex with `just c`. 2. Ask the assistant to output this exact bare URL with no Markdown link syntax: `https://www.cs.tufts.edu/~nr/cs257/archive/olin-shivers/dissertation.pdf` 3. Hold Cmd and hover or click the URL. 4. Confirm the complete URL, including the suffix after `~`, is one destination. 5. Repeat with the URL inside a Markdown table and confirm wrapped portions retain the same complete destination. Targeted tests: - `just test -p codex-tui url_with_tilde` - `just test -p codex-tui merged_text_events_preserve_entity_decoding` The full `codex-tui` test run was also executed. Its only failures were the two existing Guardian feature-flag tests: - `app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default` - `app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
## Summary - Render `/debug-config`'s `allowed_sandbox_modes` from the finalized permission constraints instead of the raw requirements list. - Add regression coverage for configured full-access and external sandbox modes being omitted when effective permissions reject them. ## Details `allowed_sandbox_modes` comes from managed requirements, but the final permissions can be further constrained by derived validation rules. For example, `permissions.filesystem.deny_read` requires sandbox enforcement, so modes that disable or externalize Codex's sandbox are not actually usable even if they were present in the raw requirements TOML. The debug renderer now enumerates the configured sandbox-mode labels and keeps only those accepted by `Config.permissions`. That makes `/debug-config` reflect the same effective permission-profile constraint path used by runtime config validation, while preserving the existing source/provenance display. ## Validation - Added a regression test for effective sandbox-mode filtering in `/debug-config`.
## Summary - Update the web search tool prompt to require Markdown links for cited sources. - Explicitly tell the model not to use `turnX`-style citations in responses. ## Context https://openai.slack.com/archives/C0AU83S0ZQU/p1780964147777649?thread_ts=1780352049.512299&cid=C0AU83S0ZQU ## Test plan - `git diff --check` - `python3 scripts/format.py --check` (fails only on Rust formatter setup: rustup cannot create temp files under `/home/dev-user/.rustup`; Just and Python formatter checks pass when using temp cache dirs)
Fixes a TUI regression where thread transitions such as `/new` and `/clear` could rebuild config without the cloud requirements loader, allowing users to fall back to non-cloud-managed settings. The config refresh path now preserves cloud requirements during thread reinitialization, and config loading is moved off the deep TUI event stack to avoid stack-overflow crashes during those reloads. - Passes the cloud requirements loader through TUI config rebuild paths. - Keeps cloud requirements applied for `/new`, `/clear`, `/fork`, side conversations, and session picker transitions. - Runs config building on a Tokio task so reloads do not occur on the deep TUI caller stack. - Adds regression coverage that cloud requirements survive thread-transition config refreshes. ## Test/Repro: - Start Codex with a cloud requirement applied. - Use `/new` or `/clear`. - The refreshed/fresh-session config should still include the cloud requirements This can be tested with any config item, at this moment for oai staff the easiest item to test is the `mentions_v2` feature. This is currently enabled in cloud requirements, but is not enabled by default. As a result, prior to these changes that feature is disabled after `/new` or `/clear`. Testing the same steps with a binary from this branch should not drop the feature enablement.
## Why `log_remote_compact_failure` was the only consumer of the compact-request logging payload and most of the token-usage breakdown fields. Once that failure log is removed, keeping the surrounding carrier types leaves dead plumbing in the compaction path and context manager. ## What changed - Remove `log_remote_compact_failure`, `CompactRequestLogData`, and the v2 wrapper that only fed that log. - Let both remote compaction implementations return the original compaction error directly. - Replace `TotalTokenUsageBreakdown` with a narrow helper that returns only the remaining value needed by compaction analytics. - Keep `estimate_response_item_model_visible_bytes` private to the context manager implementation. ## Validation - `cargo check -p codex-core`
- Code mode can now call standalone web search directly, including from nested JavaScript tool calls, and receive plaintext search results. (openai#26719) - Tool and connector input schemas now preserve `oneOf` and `allOf`, and large schemas keep more shallow structure when compacted, improving compatibility with richer MCP tools. (openai#24118, openai#27084) - `codex doctor` now includes editor and pager environment details in the local report while redacting raw values in JSON output. (openai#27081) - Plugin marketplace automation is more informative and responsive: `codex plugin marketplace list --json` now includes each marketplace source, and plugin lists can return from the cached remote catalog before refreshing in the background. (openai#27009, openai#26932) ## Bug Fixes - `codex resume --last "..."` and `codex fork --last "..."` now treat the trailing argument as the initial prompt instead of misreading it as a session ID. (openai#26818) - MCP startup warnings from subagents now stay in the thread that owns them, avoiding duplicate parent-thread alerts and stuck startup spinners in the TUI. (openai#26639) - Image edits now use the exact referenced image file paths instead of guessing from conversation history, so attached-image edits land on the intended input. (openai#26486) - Bare URLs with `~` in the path are now linkified end to end in the TUI instead of being truncated before the tilde. (openai#27088) - Thread resets such as `/new`, `/clear`, and `/fork` no longer drop cloud-managed requirements or feature flags during TUI config reloads. (openai#25177) - Sandbox execution now preserves approved escalation decisions and enforces configured proxy-only networking more consistently. (openai#24981, openai#27035) ## Chores - Release builds once again publish separate symbol archives with line tables, improving post-release crash symbolication without bringing back the earlier full-debug build slowdown. (openai#26202) - The embedded V8 toolchain was updated to `rusty_v8` 149.2.0. (openai#26464) ## Changelog Full Changelog: openai/codex@rust-v0.138.0...rust-v0.139.0 - openai#26741 fix(remote-control): preserve enrollment on generic websocket 404s @apanasenko-oai - openai#26804 fix(core-plugins): send Codex product SKU to plugin-service @ericning-o - openai#26464 build(v8): update rusty_v8 to 149.2.0 @cconger - openai#26895 ci: use bazel environment for BuildBuddy secret @bolinfest - openai#24981 fix: preserve approval sandbox decisions in unified exec @bolinfest - openai#26818 fix(tui): accept prompts with resume and fork @fcoury-oai - openai#24820 deps: update starlark to 0.14.2 @bolinfest - openai#26639 fix(tui): scope MCP startup status by thread @fcoury-oai - openai#26719 [codex] Enable standalone web search in code mode @rka-oai - openai#26632 feat: add v2 agent residency lru @jif-oai - openai#26974 Ignore proc-macro-error2 advisory @jif-oai - openai#26969 feat: count V2 concurrency by active execution @jif-oai - openai#26994 Rename multi-agent v2 close_agent to interrupt_agent @jif-oai - openai#26997 Avoid reopening v2 descendants on resume @jif-oai - openai#26821 [codex] Exclude external tool output from memories @rka-oai - openai#26202 [codex] Restore release symbol artifacts with line tables @nornagon-openai - openai#26852 fix(app-server): avoid blocking connection cleanup @apanasenko-oai - openai#26923 Add HTTP window ID to Responses client metadata @ningyi-oai - openai#26680 [codex-analytics] report compaction analytics details @rhan-oai - openai#26637 [codex] Speed up external agent session imports @stefanstokic-oai - openai#27009 [plugins] Expose marketplace source in marketplace list JSON @mpc-oai - openai#27024 ci: template custom runner names by repo @bolinfest - openai#26230 fix: preserve auto review across config and delegation @viyatb-oai - openai#27038 [codex] Clarify PR babysitter state mutations @anp-oai - openai#27037 [codex] Calm multi-agent v2 usage prompts @jif-oai - openai#26687 Pair thread environment settings @pakrym-oai - openai#27054 cli: add -P sandbox permissions profile alias @bolinfest - openai#27035 Enforce configured network proxy in codex sandbox @viyatb-oai - openai#26486 Route image edits through referenced file paths @won-openai - openai#27060 [codex-analytics] stop sending codex error subreason @rhan-oai - openai#27044 [codex] Require complete main-agent skill reads @fchen-oai - openai#24118 feat: support oneOf and allOf in tool input schemas @celia-oai - openai#26934 [codex] Prune stale curated plugin caches @xl-openai - openai#26932 Use cached remote plugin catalog for plugin list @xl-openai - openai#26091 [codex] Add OTEL counter descriptions @richardopenai - openai#27081 feat(doctor): report editor and pager environment @fcoury-oai - openai#27084 chore: preserve one more schema layer during large tool compaction @celia-oai - openai#26840 Add typed file URIs @anp-oai - openai#27088 fix(tui): linkify complete bare URLs with tildes @fcoury-oai - openai#27068 Show effective sandbox modes in /debug-config @canvrno-oai - openai#27092 Add extra config to StoredThread, leave empty for now @kumquatexpress - openai#27096 Update web search citation prompt @yuning-oai - openai#25177 Preserve cloud requirements across TUI thread resets @canvrno-oai - openai#27106 [codex] Remove remote compaction failure log @pakrym-oai
Collaborator
Author
|
Landed into main via merge commit 78b166c after renaming the branch to tui-project-opening. GitHub refused the PR merge action after the branch rename, so main was updated directly. |
Zorlin
added a commit
that referenced
this pull request
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.