Skip to content

Streamline TUI project opening#1

Closed
Zorlin wants to merge 2797 commits into
mainfrom
pal-commit
Closed

Streamline TUI project opening#1
Zorlin wants to merge 2797 commits into
mainfrom
pal-commit

Conversation

@Zorlin

@Zorlin Zorlin commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

charliemarsh-oai and others added 30 commits June 1, 2026 18:01
## Summary

I frequently want to be able to paste into the searchable menu -- the
most common use-case here is when specifying an upstream for a
`/review`, where I copy the upstream from an open terminal.
## Why

`codex app [PATH]` is the documented CLI entry point for opening Codex
Desktop on a workspace. Recent desktop builds can focus the app while
failing to honor paths passed as macOS document-open arguments via `open
-a Codex.app <workspace>`, which broke `codex app .` for users. See
openai#25333; related report: openai#25166.

The desktop app still supports the explicit
`codex://threads/new?path=...` route, so the CLI should use that
app-owned launch surface instead of depending on folder-open event
delivery.

## What Changed

- Build a `codex://threads/new?path=<workspace>` URL in the macOS app
launcher.
- Pass that URL to `open -a <Codex.app>` instead of passing the
workspace path as a document argument.
- Add coverage that workspace paths needing escaping round-trip through
URL query encoding.

## Verification

- `just test -p codex-cli codex_new_thread_url_encodes_workspace_path`
## Summary

Fixes openai#25295.

The slash-command popup reused its previous `ScrollState` when the
composer filter token changed. After scrolling the full `/` command
list, typing a narrower filter such as `/st` could clamp the stale
selection into the filtered results and highlight the wrong command.

This resets the popup selection and viewport only when the parsed filter
token changes, so normal arrow navigation is preserved while new filters
start at the first match.
Closes openai#24886.

## Why
Users can configure the TUI status line and terminal title with
`model-with-reasoning`, but issue openai#24886 asks for a compact
reasoning-only item. That lets a setup show just `default`, `low`,
`medium`, `high`, or `xhigh` without repeating the model name.

## What changed
- Added a `reasoning` item for `/statusline` and `/title` setup flows.
- Rendered the item from the effective reasoning effort, including
collaboration-mode overrides.
- Registered `reasoning` with `codex doctor` so Codex-generated
terminal-title config is not reported as invalid.
- Updated TUI setup snapshots so the picker previews include the new
item.
## Summary
- preserve existing explicit SQLite thread titles during rollout
reconciliation/backfill when the incoming rollout title is only
first-message-derived
- keep stale inferred-title repair behavior while avoiding session-index
scans during startup backfill
- add a regression test for renamed titles surviving reconcile

## Testing
- just fmt
- just test -p codex-rollout
- just test -p codex-state
## Rollout compression stack

This stack splits openai#24941 into reviewable steps for local rollout
compression. The design is intentionally staged:

1. Teach readers, listing, search, and lookup to understand compressed
rollouts.
2. Make append and resume paths materialize compressed rollouts back to
plain JSONL before writing.
3. Add a disabled-by-default worker that can compress cold archived
rollouts behind `local_thread_store_compression`.

The key invariant is that writers append to plain `.jsonl`. A
`.jsonl.zst` file is a cold/read representation; if a write is needed,
the compressed file is materialized back to plain JSONL first. Readers
prefer plain `.jsonl` when both forms exist and can fall back to the
compressed sibling during transitions.

The worker is deliberately the last PR and remains behind an
under-development feature flag. It currently scans only
`archived_sessions`, not active `sessions`, because active sessions have
the highest resume/append race risk. That means this stack does not yet
compress most unarchived local history.

## Known race / follow-up

The remaining unresolved design question is writer/compressor
coordination. Even for archived rollouts, a resume or metadata update
can append while the worker is replacing the plain file with
`.jsonl.zst`; the current double-stat checks narrow but do not fully
eliminate the window where a writer has opened the plain file before
unlink. Do not treat the worker PR as production-ready until we either:

- prevent append/resume paths from racing archived compression, or
- introduce a shared representation/append lock or equivalent
coordination.

The first two PRs are useful independently: they make compressed
rollouts readable and make append paths safely recover back to plain
JSONL. The third PR isolates the worker behavior so that coordination
issue is reviewable separately.

## Validation

Focused local validation for the stack includes:

- `just test -p codex-rollout`
- `just test -p codex-thread-store` where thread-store paths were
touched
- `just test -p codex-features` for the feature flag slice
- `just bazel-lock-check` after dependency graph changes
- scoped `just fix -p ...` passes for changed crates

CI is still the source of truth for the full platform matrix.

## This PR in the stack

This is PR 3/3, based on openai#25088. It adds the under-development feature
flag and starts the best-effort background worker when enabled. The
worker currently compresses only cold archived rollouts, skips active
sessions, verifies compressed output, preserves mtime and permissions,
keeps a store-level lock heartbeat, and cleans stale temp files.

Stack order:

1. openai#25087: read compressed local rollouts.
2. openai#25088: materialize compressed rollouts before append.
3. This PR: add the disabled local compression worker.
## Why

Codex 0.135.0 started shipping bundled SQLite 3.51.x via SQLx 0.9.0 to
avoid the older WAL corruption bug fixed by openai#24728. On Windows x64,
openai#25367 reports an immediate `STATUS_ILLEGAL_INSTRUCTION` crash on a
Haswell CPU when starting normal Codex paths.

Rather than downgrading SQLite, this keeps the newer bundled SQLite
source and removes SQLite compiler-intrinsic code paths from the Windows
x64 release build.

## What changed

For `x86_64-pc-windows-msvc` release builds, export
`LIBSQLITE3_FLAGS=SQLITE_DISABLE_INTRINSIC` before `cargo build` in:

- `.github/workflows/rust-release.yml`
- `.github/workflows/rust-release-windows.yml`

Other targets keep their current SQLite build flags.

## Verification

- `git diff --check`
## Summary
- Configure the rust-release build job with
`CARGO_NET_GIT_FETCH_WITH_CLI=true`
- Document the macOS SecureTransport/libgit2 failure mode that hit the
`libwebrtc`/`libyuv` git submodule fetch

## Root cause
The release run at
https://github.com/openai/codex/actions/runs/26717498860/job/78745156683
repeatedly failed before compilation because Cargo's libgit2 fetch path
could not clone the nested `yuv-sys/libyuv` submodule from
`chromium.googlesource.com`, ending with `SecureTransport error:
connection closed via error`.

## Validation
- `git diff --check`

This is a workflow-only change, so I did not run Rust package tests.
## Why

[openai#25089](openai#25089) added the
background worker for compressing cold archived rollouts, but the worker
still processed files effectively one at a time: each compression job
was sent to `spawn_blocking` and then awaited before the next file
started. On machines with a backlog of archived rollouts, that makes
catch-up slower than it needs to be even though the actual compression
work already runs off the async runtime.

## What Changed

- Queue rollout compression work in a `JoinSet` while directory
traversal continues.
- Cap the worker at two in-flight compression jobs so it can overlap
compression without turning the background task into unbounded blocking
work.
- Drain pending jobs before returning, including the
`read_dir.next_entry()` error path, so every launched job still
contributes to the final `compressed`, `skipped`, and `failed` stats.
- Treat task join failures the same way as compression failures in the
worker's warning and failure accounting.
## Summary
- add public `codex_exec_server::EnvironmentPathRef`
- bind an absolute path to its owning executor filesystem
- keep path operations in the next review slice

## Stack
- 1/5 in the skills path authority stack extracted from
openai#25098

## Validation
- `cd /Users/starr/code/codex-worktrees/pr-25098-restack4/codex-rs &&
just fmt`
- GitHub CI pending on rewritten head
)

## Summary

Renames the MultiAgentV2 turn-triggering tool from `assign_task` to
`followup_task` so the exposed tool name better describes sending an
additional task to an existing agent.

This updates the tool spec, handler/module names, registry wiring,
default multi-agent v2 usage hints, and tests. Rollout trace
classification keeps accepting legacy `assign_task` events so older
traces still reduce correctly, while docs show the new tool name.

## Test plan

- `just test -p codex-core followup_task`
- `just test -p codex-core -E
'test(multi_agent_feature_selects_one_agent_tool_family) |
test(multi_agent_v2_can_use_configured_tool_namespace) |
test(code_mode_only_can_expose_namespaced_multi_agent_v2_as_normal_tools)'`
- `just test -p codex-rollout-trace`
- `just fix -p codex-core`
- `just fix -p codex-rollout-trace`

Notes: `just fmt` ran `cargo fmt` but failed in the Python ruff phase
because the local environment could not resolve `hatchling>=1.27.0` from
the configured internal registry. A full `just test -p codex-core` also
hit unrelated environment-sensitive integration failures involving
missing spawned test binaries/sandbox behavior; the changed multi-agent
spec/handler tests passed in the filtered runs above.
## Summary
- Preserve app declaration order when loading plugin .app.json files.
- Keep plugin connector summaries in plugin app order after connector
metadata is merged and filtered.
- Add regression coverage for .app.json order and connector summary
order.

## Validation
- just fmt
- just test -p codex-chatgpt
connectors_for_plugin_apps_returns_only_requested_plugin_apps
- just test -p codex-core-plugins
effective_apps_preserves_app_config_order
- just fix -p codex-core-plugins (passes with existing clippy
large_enum_variant warning in core-plugins/src/manifest.rs)
- just fix -p codex-chatgpt
- just bazel-lock-update
- just bazel-lock-check
## Summary

Make the root `justfile` usable from Windows without maintaining a
separate Windows copy of most recipes.

The repo recipes previously assumed POSIX shell behavior for things like
variadic argument forwarding (`"$@"`) and stderr redirection
(`2>/dev/null`). That made common workflows such as `just fmt`, `just
test`, and `just log` unreliable from Windows. This PR introduces a
small cross-platform shell adapter so recipes can stay mostly unified
while still expanding the few shell-specific constructs correctly on
macOS/Linux and Windows.

## What Changed

- Add `scripts/just-shell.py` as the configured `just` shell adapter.
  - On Unix it invokes `sh -cu`.
- On Windows it invokes `pwsh -CommandWithArgs` so arguments containing
spaces are preserved.
- Add portable recipe placeholders:
- `{args}` expands to `"$@"` on Unix and the equivalent PowerShell
forwarded-args expression on Windows.
- `{stderr-null}` expands to the platform-specific stderr suppression
used by `fmt`.
- Convert most variadic one-line recipes to the unified `{args}` form,
including `codex`, `exec`, `file-search`, `app-server-test-client`,
`fix`, `clippy`, `bench`, `mcp-server-run`, `write-app-server-schema`,
and `argument-comment-lint-from-source`.
- Keep genuinely shell-specific recipes split or Unix-only for now,
including recipes backed by `.sh` scripts or recipes whose bodies are
more than simple command forwarding.
- Add a Windows `just install` path that installs PowerShell via
`winget` when `pwsh` is not available, then runs the same basic Rust
setup steps.
- Update the SDK test that validates the root `fmt` recipe so it
recognizes the new portable stderr placeholder.

## Validation

- `just --summary`
- `just --dry-run fmt`
- `just --dry-run bench-smoke`
- `just --dry-run codex foo "bar binky" baz`
- `just --dry-run write-hooks-schema`
- `just --dry-run bazel-lock-update`
- `just --dry-run argument-comment-lint-from-source -- "foo bar"`
- `git diff --check -- justfile scripts/just-shell.py
sdk/python/tests/test_artifact_workflow_and_binaries.py`
- Verified Windows argv preservation through `scripts/just-shell.py`
with arguments containing spaces.
- `uv run --frozen --project sdk/python --extra dev pytest
sdk/python/tests/test_artifact_workflow_and_binaries.py::test_root_fmt_recipe_formats_rust_and_python_sdk`
## Why

`codex_core` is consistently a bottleneck for incremental builds during
iteration. The simplest fix is to make the crate smaller.

## Summary

`codex-core` owns several reusable prompt renderers and static prompt
assets, which makes the crate harder to split apart.

Rename `codex-review-prompts` to `codex-prompts` and move shared review,
goal, permissions, compaction, realtime, hierarchical AGENTS.md, and
`apply_patch` prompts into it. Move prompt-only tests and update
consumers and `CODEOWNERS`.

## Validation

- `just test -p codex-prompts -p codex-apply-patch`
- `just test -p codex-core prompt_caching`
- Bazel builds for the affected crates
## Why

[openai#25089](openai#25089) introduced the
background worker that compresses cold archived rollouts, and
[openai#25654](openai#25654) made that pass
faster once it starts. But the worker still deleted
`rollout-compression.lock` on successful exit, so the existing six-hour
staleness window only helped with overlapping or crashed workers. Each
new local thread-store initialization could immediately rescan archived
rollouts even if a full pass had just finished.

This change keeps the existing marker around long enough to throttle
redundant reruns. The worker is still best-effort, but it no longer does
repeated startup scans when nothing new is eligible for compression.

## What Changed

- Replace the drop-scoped `CompressionLock` with a
`CompressionRunMarker` that claims the existing
`.tmp/rollout-compression.lock` path and leaves it in place after
success.
- Reuse the existing six-hour staleness window to block both overlapping
starts and immediate reruns, while still letting a stale marker be
reclaimed.
- Update the worker docs and debug logging to describe the new "already
running or recently ran" behavior.
- Extend the rollout compression tests to assert that a successful run
leaves the marker behind and that a fresh marker suppresses a new run.

## Validation

- `just test -p codex-rollout`
## Why

Python files under `scripts/` were not covered by the repository
formatting recipe or the CI formatting job, so formatting drift could
merge unnoticed.

## What

- Add a dedicated `scripts/pyproject.toml` and `scripts/uv.lock` so
root-script formatting uses a locked Ruff version.
- Extend `just fmt` to format root Python scripts and add
`fmt-scripts-check` for CI.
- Run `just fmt-scripts-check` from `.github/workflows/ci.yml`,
installing `uv` through SHA-pinned `astral-sh/setup-uv` while retaining
the `uv` `0.11.3` pin.
- Apply Ruff formatting to the root Python scripts, including
`scripts/just-shell.py`, and extend
`sdk/python/tests/test_artifact_workflow_and_binaries.py` to cover the
root formatting recipe.
- Update `AGENTS.md` so agents run `just fmt` after code changes
anywhere in the repository.

## Validation

- Extended the existing Python SDK workflow test to assert that `just
fmt` includes root Python scripts.
## Why

Guardian auto-review normally uses the provider-preferred review model
when one is available. Some parent models need model-catalog metadata to
select a different review model while keeping older `/models` payloads
compatible when that metadata is absent.

## What changed

- Added optional `ModelInfo::auto_review_model_override` metadata to the
public model payload as a review-model slug.
- Updated Guardian review model selection to prefer the catalog override
when present, while preserving the existing provider preferred-model
path and parent-model fallback when it is omitted.
- Added focused Guardian coverage for override and no-override model
selection.
- Added an `auto_review` core integration suite test that loads override
metadata from a remote model catalog path and asserts the strict
auto-review `/responses` request uses the catalog-selected review model.
- Updated existing `ModelInfo` fixtures and local catalog constructors
for the new optional field.

## Validation

- `cargo test -p codex-protocol
model_info_defaults_availability_nux_to_none_when_omitted`
- `cargo test -p codex-core guardian_review_uses_`
- `cargo test -p codex-core
remote_model_override_uses_catalog_model_for_strict_auto_review --test
all`
- `just fix -p codex-protocol`
- `just fix -p codex-core`
- `just fmt`
- `git diff --check`
## Summary
- add executor filesystem canonicalization as a bound-path operation
- route remote canonicalization through the exec-server filesystem RPC
surface
- keep path normalization attached to the filesystem that owns the path

## Stack
- 2/5 in the skills path authority stack extracted from
openai#25098
- follows merged openai#25121

## Validation
- `cd
/Users/starr/code/codex-worktrees/pr-25098-restack-review-pr1b/codex-rs
&& just fmt`
- Not run: tests/checks (not requested)
- GitHub CI pending on rewritten head
Fixes this flake:
https://github.com/openai/codex/actions/runs/26773809591/job/78919970410?pr=25659

This test is about zsh-fork subcommand approval behavior, not workspace
sandboxing, so it now runs with `DangerFullAccess` to avoid macOS
sandbox setup failures before the second subcommand approval.
## Why

`shell_zsh_fork` and unified exec need to remain independently
controllable for enterprise rollouts, but we also need a third mode that
composes them. That composed mode is intended to preserve unified exec
command lifecycle support while letting the zsh fork provide more
accurate `execv(2)` interception.

Enabling `unified_exec_zsh_fork` by itself is intentionally not
sufficient. It is a composition gate, not a dependency-enabling
shortcut:

- `unified_exec` selects the PTY-backed unified exec tool.
- `shell_zsh_fork` opts into the zsh fork backend.
- `unified_exec_zsh_fork` only allows those two already-enabled modes to
be composed so local zsh unified exec commands can launch through the
zsh fork.

This separation is deliberate. Enterprises and staged rollouts must be
able to enable or disable unified exec and zsh-fork independently. If
`unified_exec_zsh_fork` implied either dependency, then enabling one
under-development composition flag would silently activate a shell
backend that the configured feature set left disabled.

This PR introduces only the configuration and planning gate for that
composition. Existing `shell_zsh_fork` behavior continues to use the
standalone shell tool unless the new composition feature is explicitly
enabled alongside both dependencies.

## What Changed

- Added the under-development feature flag `unified_exec_zsh_fork`.
- Added `UnifiedExecFeatureMode` so the three input feature flags
collapse into `Disabled`, `Direct`, or `ZshFork` mode before tool
planning.
- Updated tool selection so zsh-fork composition requires
`unified_exec`, `shell_zsh_fork`, and `unified_exec_zsh_fork`.
- Kept the existing standalone zsh-fork shell tool behavior when only
`shell_zsh_fork` is enabled.
- Updated config schema output for the new feature flag.

## Verification

- Added feature and tool-config coverage for the new gate.
- Added planner coverage proving `shell_zsh_fork` remains standalone
until composition is explicitly enabled.
- Ran focused tests for `codex-features`, `codex-tools`, and the
affected `codex-core` planner case.





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24979).
* openai#24982
* openai#24981
* openai#24980
* __->__ openai#24979
## Why

When unified exec is configured to launch through the zsh fork, local
commands should not let the model override the shell binary with the
`shell` parameter. The configured zsh fork is the mechanism that makes
`execv(2)` interception reliable, so exposing `shell` for local zsh-fork
execution would create a confusing API surface and undermine the
composition.

Remote environments are different: zsh-fork interception is local-only,
so remote unified-exec calls must keep direct unified-exec behavior and
still expose `shell` when a remote environment can be selected.

## What Changed

- Taught the `exec_command` schema builder to omit the `shell` parameter
when requested.
- Hid `shell` from the unified-exec tool schema only when zsh-fork
unified exec applies to all selectable environments.
- Kept `shell` visible when any remote environment can be targeted,
because those calls run through direct unified exec.
- Made unified exec choose the effective shell mode per selected
environment: local environments keep zsh-fork mode, remote environments
use direct mode.
- Left direct unified-exec behavior unchanged, including support for
model-specified shells there.

## Verification

- Added schema coverage showing `exec_command` can hide `shell`.
- Added planner coverage showing zsh-fork unified exec hides `shell` for
local-only execution while direct unified exec still exposes it.
- Added planner coverage showing `shell` remains visible when a remote
environment is available.
- Added handler coverage showing remote environments use direct
unified-exec shell mode instead of zsh-fork mode.
- Ran the focused `codex-core` shell-parameter and zsh-fork tests.







---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/24980).
* openai#24982
* openai#24981
* __->__ openai#24980
## Summary

Add counter telemetry for the local rollout compression worker so we can
see when it runs, why it skips, and how individual file/materialization
paths resolve.

## Changes

- Emit `codex.rollout_compression.run` with statuses for start,
completion, failure, duplicate-run skip, and missing runtime skip.
- Emit `codex.rollout_compression.file` outcomes for scanned,
compressed, skipped, and failed compression candidates.
- Emit `codex.rollout_compression.temp_cleanup` and
`codex.rollout_compression.materialize` counters for cleanup and
decompression paths.

## Validation

- `just fmt`
- `just test -p codex-rollout`
- `just fix -p codex-rollout`
## Why

New unit test modules should follow one consistent layout so
implementation files stay focused and test suites remain easy to locate,
without creating cleanup churn in existing inline test modules.

## What changed

- Added `AGENTS.md` guidance requiring new test modules to use separate
sibling `*_tests.rs` files with an explicit `#[path = "..._tests.rs"]`
attribute.
- Clarified that existing inline `#[cfg(test)] mod tests { ... }`
modules should not be moved solely to follow the new convention.

## Validation

- Ran `git diff --check`.
## Summary

Stacked on openai#25679. Add histogram telemetry for rollout compression
runtime, per-file compression time, byte sizes, and compression ratio.

## Changes

- Emit `codex.rollout_compression.run.duration_ms` tagged by final run
status.
- Emit `codex.rollout_compression.file.duration_ms` tagged by file
outcome.
- Emit source and compressed byte histograms for compression
candidates/results.
- Emit `codex.rollout_compression.file.compression_ratio` for successful
compressions, recorded as integer basis points.

## Validation

- `just fmt`
- `just test -p codex-rollout`
- `just fix -p codex-rollout`
## Summary
- describe omitted code-mode tools as deferred nested tools instead of
MCP/app tools
- update the prompt-description assertion to match

## Why
Deferred dynamic tools are also callable through `tools` and
discoverable in `ALL_TOOLS`, so the previous MCP/app-specific wording
was too narrow.

## Validation
- `just fmt`
- `just test -p codex-code-mode`
- `git diff --check`
## Why

Python contributions in this repository should target the declared
Python 3 runtime instead of carrying Python 2 compatibility patterns
forward. When compatibility across Python 3 point releases matters,
contributors need a consistent source of truth for the minimum supported
version.

## What changed

- Added Python development guidance to `AGENTS.md` stating that the
repository uses Python 3+ and should not use the `__future__` module.
- Documented that contributors should check the nearest `pyproject.toml`
`requires-python` field when evaluating Python 3 point-release
compatibility.

## Testing

Not run (guidance-only change).
…5681)

## Summary
- Deduplicate installed `openai-curated` and `openai-curated-remote`
plugin conflicts by feature flag.
- Prefer remote when remote plugins are enabled; otherwise prefer local,
while preserving one-sided installs.

## Testing
- `just fmt`
- `git diff --check`
- Targeted `just test` was blocked locally because `cargo-nextest` is
not installed.
This PR brought to you via VS Code rather than Codex...

- opened `codex-rs/app-server/tests/common/mcp_process.rs`
- put the cursor on `McpServer`
- hit `F2` and renamed the symbol to `TestAppServer`
- went to the file tree
- hit enter and renamed `mcp_process.rs` to `test_app_server.rs`
- ran **Save All Files** from the Command Palette
- ran `just fmt`

The End

(Admittedly, most of the local variables for `TestAppServer` are still
named `mcp`, though.)
jif-oai and others added 25 commits June 8, 2026 22:32
## Summary
- tighten the default multi-agent v2 root and subagent usage hints to
bias toward local work
- add a pre-call gate to the v2 spawn_agent description for independent,
bounded, parallelizable subtasks

## Validation
- just fmt
- started just test -p codex-core, but it was interrupted before
completion per follow-up request to commit and push immediately
## Why

Thread cwd and environment selections are a single logical setting in
core: updating one without the other can silently desynchronize the
next-turn execution context. This change makes that relationship
explicit in the internal thread settings flow while preserving the
existing app-server public API shape.

## What changed

- Moved the cwd/environment pair through internal
`ThreadSettingsOverrides.environment_settings` instead of a top-level
internal `cwd` field.
- Kept `thread/settings/update` public params unchanged, with app-server
translating top-level `cwd` into the paired internal settings shape.
- Moved `Op::UserInput` environment overrides into thread settings so
user turns and settings updates use the same core path.
- Updated core, app-server, MCP, memories, sample, and test callsites to
construct the paired settings shape.

## Verification

- `git diff --check`
- Local test run starting after PR creation.
## Why

`codex sandbox --permissions-profile` is useful when running commands
under a named permissions profile, but the long option is cumbersome for
a debugging-oriented command. `-p` is already used for the config
profile selector, so `-P` gives the permissions profile selector a
compact, non-conflicting alias.

## What Changed

- Added `short = 'P'` to the `permissions_profile` option for the macOS,
Linux, and Windows sandbox command structs in
[`codex-rs/cli/src/lib.rs`](https://github.com/openai/codex/blob/6d9f9c5cdcaa0a156aa2dabbde259ae5e9e8bc0b/codex-rs/cli/src/lib.rs#L29-L112).
- Added parser coverage for `codex sandbox -P :workspace -- echo` in
[`codex-rs/cli/src/main.rs`](https://github.com/openai/codex/blob/6d9f9c5cdcaa0a156aa2dabbde259ae5e9e8bc0b/codex-rs/cli/src/main.rs#L2883-L2896).

## Verification

- `just test -p codex-cli` passed, including the new
`sandbox_parses_permissions_profile_short_alias` parser test.
## Why

`codex sandbox` can start a network proxy from a configured permission
profile. Previously, sandbox-level containment was tied to managed
network requirements rather than whether a proxy was actually active.
This meant config-driven proxy policies were not consistently enforced
as the sandbox's only network path.

## What changed

- Enable proxy-only network containment whenever `codex sandbox` starts
a network proxy.
- Apply the same active-proxy check to the macOS and Linux sandbox
paths.
- Add a Linux regression test that verifies a sandboxed command cannot
establish a direct connection while the configured proxy is active.

## Test plan

- `just test -p codex-cli debug_sandbox::tests`
- `sandbox_with_network_proxy_blocks_direct_loopback_access` runs on
Linux to cover the config-driven proxy path end to end.
## Why

Image edits should use the exact images selected by the model instead of
inferring edit inputs from conversation history.

## What changed

- Replaced the image tool's `action` argument with optional
`referenced_image_paths`.
- Treats omitted or empty references as generation and populated
references as editing.
- Reads referenced absolute image paths and packages them as image data
URLs for the edit request.
- Removed the previous history-selection and image-count heuristics.
- Updated direct and code-mode tool instructions and calls.
- Added an app-server integration test covering an attached image routed
to the image edit endpoint.

## Validation
- Tested end-to-end on local `just codex` with copy pasted image,
attached image, etc.
- `just test -p codex-image-generation-extension`
- `just test -p codex-app-server
standalone_image_edit_uses_attached_model_visible_image`
- `just fix -p codex-image-generation-extension`
- `just bazel-lock-check`
## Summary
- stop emitting `codex_error_subreason` on `codex_turn_event`
- remove the transient analytics fact plumbing that copied
`CodexErr::InvalidRequest(String)` into the event
- update analytics serialization coverage accordingly

## Why
`codex_error_subreason` is a free-form copy of `InvalidRequest(String)`,
including raw provider 400 bodies in some paths. That makes it unsafe as
an analytics field because it can carry user-derived or sensitive text.

## Validation
- `just fmt`
- `just test -p codex-analytics`
## Summary
- require the main agent to read selected `SKILL.md` files completely,
continuing truncated or paginated reads through EOF
- require the main agent to personally read task-required instruction
references instead of delegating their interpretation
- clarify that progressive disclosure selects relevant files without
permitting partial reads
- preserve subagent use for task work when the selected skill allows it
- cover both absolute-path and aliased-root prompt variants

## Why
Partial reads can skip routing and verification requirements later in
skill instructions. Delegated summaries can also omit constraints the
main agent needs to follow. The existing "Read only enough" wording made
both behaviors appear acceptable.

## Impact
Agents should follow complete selected skill instructions while
continuing to avoid unrelated references, scripts, and assets. Subagents
remain available for task execution where permitted.

## Test plan
- `just test -p codex-core-skills` (101 passed)
- `just fmt`
- `git diff --check`
## Why

Some connector golden schemas use JSON Schema composition keywords
beyond `anyOf`, specifically top-level or nested `oneOf` and `allOf`.
Codex currently needs to preserve those shapes when parsing MCP tool
input schemas so connector tools do not lose valid schema structure
during normalization.

To prevent an increased Responses API error rate, this PR will be merged
after the Responses API supports top-level `oneOf`/`allOf`.

## What Changed

- Adds `oneOf` and `allOf` support to `JsonSchema`, matching the
existing `anyOf` handling.
- Traverses `oneOf` and `allOf` anywhere schema children are visited,
including sanitization, definition reachability, description stripping,
and deep schema compaction.
- Adds a final large-schema compaction pass that prunes schema objects
containing `anyOf`, `oneOf`, or `allOf` to `{}` if earlier compaction
passes still leave the schema over budget.

## Validation
Golden schema token validation over `2,025` schemas under
`golden_schemas`, all parsed successfully. Token count is `o200k_base`
over compact JSON from `parse_tool_input_schema`.

| Percentile | Before PR | After oneOf/allOf | After pruning |
|---|---:|---:|---:|
| p0 | 9 | 9 | 9 |
| p10 | 63 | 64 | 64 |
| p25 | 86 | 87 | 87 |
| p50 | 125 | 128 | 128 |
| p75 | 203 | 206 | 206 |
| p90 | 327 | 333 | 333 |
| p95 | 460 | 473 | 473 |
| p99 | 763 | 779 | 779 |
| max | 891 | 955 | 955 |

Totals:

| Parser state | Total tokens |
|---|---:|
| Before PR | 345,713 |
| After oneOf/allOf | 352,686 |
| After pruning | 352,686 |

The pruning column matches the oneOf/allOf column for this corpus
because no parsed compact golden schema remains over the `4,000`
compact-byte budget after the earlier compaction passes.
Curated plugin startup refresh now removes cached plugins whose names no
longer appear in the raw openai-curated marketplace. This prevents users
with the old standalone Google Sheets plugin selected locally from
continuing to load its stale cache after the curated repo drops it.

Existing config is left untouched, and plugins still present in the
marketplace continue to refresh from local curated sources.

Validation:
- `just fmt`
- `just test -p codex-core-plugins`
- `git diff --check`
## Summary

This changes the default remote plugin marketplace listing to use the
cached global remote catalog when it is already present on disk. The
foreground `plugin/list` response can then return from the local catalog
cache instead of waiting on `/ps/plugins/list`.

When a cached global catalog was present at the start of the request,
`plugin/list` still schedules a background refresh through the existing
plugin-list background task path so the disk cache is updated for future
requests. Cache misses keep the existing synchronous remote fetch path
and write the cache, and they do not schedule an extra duplicate
background `/ps/plugins/list` refresh.

Installed/enabled state continues to come from the existing remote
installed overlay path. This change only affects the global remote
catalog directory data used by `plugin/list`.

## Testing

- `just fmt`
- `just test -p codex-app-server
plugin_list_uses_cached_global_remote_catalog_and_refreshes_it`
- `just test -p codex-core-plugins`
- `git diff --check`
## Why

Metric descriptions should be declared with reusable OTEL instruments
instead of being coupled to individual consumers. Counter descriptions
are the smallest API primitive needed by the exec-server observability
work.

## What changed

- Adds `counter_with_description` while preserving the existing counter
API.
- Caches counters by name and description so instrument metadata remains
part of the declaration identity.
- Covers the exported description together with the existing value and
attribute contract.

This PR only adds counter descriptions. It does not add gauges,
second-based durations, or exec-server adoption.

## Stack

1. **openai#26091: counter descriptions**
2. openai#27057: gauge instruments
3. openai#27058: second-based duration histograms

Related independent coverage: openai#27059 tests OTLP HTTP log and trace event
export.

The `codex-exec-server` bounded service tag now stays with the
exec-server adoption change instead of this reusable infrastructure
stack.

## Validation

- `just test -p codex-otel`
- `just fix -p codex-otel`
- `just fmt`
## Background

This was prompted by
[openai#26858](openai#26858), where the
attached doctor report did not include the editor selection and I had to
[ask which editor was in
use](openai#26858 (comment))
before investigating the external-editor newline issue. Capturing these
variables in doctor makes that context available up front in future
reports.

`codex doctor` is intended to capture enough local context to diagnose
startup and terminal behavior, but it did not report the environment
variables that select an external editor or configure command pagers.

The TUI [prefers `VISUAL` over
`EDITOR`](https://github.com/openai/codex/blob/56554904babcaacf4444a2cc90716880837dff7c/codex-rs/tui/src/external_editor.rs#L31-L38),
so missing or unexpected values can explain why the external-editor
shortcut fails or launches the wrong command. Pager values are also
useful inherited-shell context even though [unified exec normalizes its
effective pager variables to
`cat`](https://github.com/openai/codex/blob/56554904babcaacf4444a2cc90716880837dff7c/codex-rs/core/src/unified_exec/process_manager.rs#L60-L70).

These variables can contain arbitrary command arguments or inline
environment assignments. The human report is local, but `codex doctor
--json` may be attached to feedback, so the machine-readable report
should not include their raw contents.

## What Changed

- Report `VISUAL` and `EDITOR` in the system environment details, using
`not set` when either variable is absent.
- Report inherited `PAGER`, `GIT_PAGER`, `GH_PAGER`, and `LESS` values
when present.
- Preserve full values in local human output while reducing these fields
to `set` or `not set` in redacted JSON output.
- Add structured check, JSON-redaction, rendered-output, and snapshot
coverage.

## How to Test

1. From `codex-rs`, run Codex with explicit editor and pager variables:

   ```sh
env VISUAL='code --wait' EDITOR=vim PAGER='less -R' GIT_PAGER=delta
GH_PAGER=less LESS=-FRX \
     cargo run -p codex-cli --bin codex -- doctor --no-color
   ```

2. Confirm the `system` details show the full values for all six
variables.
3. Unset the pager variables and rerun the command. Confirm pager rows
are omitted while missing editor variables are shown as `not set`.
4. Run the same configured environment with `doctor --json`. Confirm
each configured editor or pager field is reported as `set` and none of
the raw commands or arguments appear in the JSON.

Targeted tests:

- `just test -p codex-cli` (279 tests passed)
…penai#27084)

## Summary

Some customer MCP tools expose large input schemas that exceed Codex's
compact schema budget even after description stripping. Today, the final
compaction pass collapses complex schemas starting at depth 2, which can
erase important shallow call structure such as small `anyOf` branches,
required fields, and help-mode entry points. In one reported case, this
degraded a tool schema into `query: any | any`, leaving the model
without enough structure to discover the required help call.

This change raises the deep-schema collapse boundary from depth 2 to
depth 3. That preserves one additional layer of the tool contract while
still collapsing deeper expensive subtrees to `{}` when a schema remains
over budget.

## What Changed

- Increased `MAX_COMPACT_TOOL_SCHEMA_DEPTH` from `2` to `3`.
- Updated the schema compaction traversal test to assert the new
collapse boundary.
- The resulting compacted shape keeps useful shallow structure, for
example:
  - top-level argument names
  - shallow `anyOf` branches
  - required object fields
  - nested property names one level deeper than before

## Validation

- Ran `just test -p codex-tools`: 81 tests passed.
- Ran a golden schema corpus comparison over 214 discovered tool input
schemas under `golden_schemas/*/mcp_tools/*/input_schema.json`.
- Depth 2 and depth 3 had identical percentile token counts across the
corpus.
  - Both ended with `0 / 214` schemas over 1k tokens.
- Both ended with `0 / 214` schemas over the 4,000-byte compact JSON
budget.
- Only one golden schema changed, increasing from 49 to 56 tokens, so
this does not appear to introduce a meaningful corpus-wide regression.

Corpus percentile results:

| Percentile | Depth 2 | Depth 3 |
|---|---:|---:|
| p0 | 9 | 9 |
| p10 | 31 | 31 |
| p25 | 54 | 54 |
| p50 | 81 | 81 |
| p75 | 143 | 143 |
| p90 | 290 | 290 |
| p95 | 431 | 431 |
| p99 | 600 | 600 |
| max | 832 | 832 |
## Why

Codex needs stable `file:` URI identifiers that can cross process and
operating-system boundaries without eagerly interpreting them as native
paths. Existing fields also need to keep accepting absolute path strings
during migration.

## What changed

- Add `codex-utils-path-uri` with a validated, immutable `PathUri`
wrapper that currently accepts only `file:` URLs.
- Expose URI-level `basename`, `parent`, and `join` operations that
preserve authorities and percent encoding without guessing the source
operating system.
- Keep native conversion explicit through `AbsolutePathBuf` and the
current host rules.
- Serialize as canonical URI text while accepting both URI text and
legacy absolute native paths during deserialization.
- Add adversarial coverage for Windows-looking and POSIX paths, UNC
authorities, encoded metadata characters, non-UTF-8 POSIX paths, URI
hierarchy operations, and legacy serde round trips.
## Background

Bare URLs containing `~` in their path are currently only clickable up
to the tilde in the interactive TUI. For example, Codex renders the
visible text for:


`https://www.cs.tufts.edu/~nr/cs257/archive/olin-shivers/dissertation.pdf`

but the OSC 8 destination stops at `https://www.cs.tufts.edu/`. This
makes Cmd-click open the wrong location even though the terminal
recognizes the complete URL outside Codex.

Fixes openai#26774.

## Root Cause

The URL scanner already accepts `~`. The truncation happens earlier:
with strikethrough parsing enabled, `pulldown-cmark` splits this URL
into adjacent decoded `Event::Text` values around the tilde. The
Markdown renderer annotated each text event independently, so only the
first event still looked like a complete URL with a supported scheme.

The renderer now merges adjacent decoded text events before URL
annotation. It preserves the combined source range while retaining
parser-decoded contents, which avoids regressing entities such as
`&amp;`.

## Changes

- Add a small iterator that merges adjacent decoded Markdown text events
and their source ranges.
- Apply it at the Markdown renderer boundary before hyperlink detection.
- Add regression coverage for the reported URL in prose, wrapped table
output, and entity-decoded URLs.

## How to Test

1. Run Codex with `just c`.
2. Ask the assistant to output this exact bare URL with no Markdown link
syntax:

`https://www.cs.tufts.edu/~nr/cs257/archive/olin-shivers/dissertation.pdf`
3. Hold Cmd and hover or click the URL.
4. Confirm the complete URL, including the suffix after `~`, is one
destination.
5. Repeat with the URL inside a Markdown table and confirm wrapped
portions retain the same complete destination.

Targeted tests:

- `just test -p codex-tui url_with_tilde`
- `just test -p codex-tui merged_text_events_preserve_entity_decoding`

The full `codex-tui` test run was also executed. Its only failures were
the two existing Guardian feature-flag tests:

-
`app::tests::update_feature_flags_disabling_guardian_clears_review_policy_and_restores_default`
-
`app::tests::update_feature_flags_disabling_guardian_clears_manual_review_policy_without_history`
## Summary
- Render `/debug-config`'s `allowed_sandbox_modes` from the finalized
permission constraints instead of the raw requirements list.
- Add regression coverage for configured full-access and external
sandbox modes being omitted when effective permissions reject them.

## Details
`allowed_sandbox_modes` comes from managed requirements, but the final
permissions can be further constrained by derived validation rules. For
example, `permissions.filesystem.deny_read` requires sandbox
enforcement, so modes that disable or externalize Codex's sandbox are
not actually usable even if they were present in the raw requirements
TOML.

The debug renderer now enumerates the configured sandbox-mode labels and
keeps only those accepted by `Config.permissions`. That makes
`/debug-config` reflect the same effective permission-profile constraint
path used by runtime config validation, while preserving the existing
source/provenance display.

## Validation
- Added a regression test for effective sandbox-mode filtering in
`/debug-config`.
## Summary

- Update the web search tool prompt to require Markdown links for cited
sources.
- Explicitly tell the model not to use `turnX`-style citations in
responses.

## Context


https://openai.slack.com/archives/C0AU83S0ZQU/p1780964147777649?thread_ts=1780352049.512299&cid=C0AU83S0ZQU

## Test plan

- `git diff --check`
- `python3 scripts/format.py --check` (fails only on Rust formatter
setup: rustup cannot create temp files under `/home/dev-user/.rustup`;
Just and Python formatter checks pass when using temp cache dirs)
Fixes a TUI regression where thread transitions such as `/new` and
`/clear` could rebuild config without the cloud requirements loader,
allowing users to fall back to non-cloud-managed settings. The config
refresh path now preserves cloud requirements during thread
reinitialization, and config loading is moved off the deep TUI event
stack to avoid stack-overflow crashes during those reloads.

- Passes the cloud requirements loader through TUI config rebuild paths.
- Keeps cloud requirements applied for `/new`, `/clear`, `/fork`, side
conversations, and session picker transitions.
- Runs config building on a Tokio task so reloads do not occur on the
deep TUI caller stack.
- Adds regression coverage that cloud requirements survive
thread-transition config refreshes.

## Test/Repro:
  - Start Codex with a cloud requirement applied.
  - Use `/new` or `/clear`.
- The refreshed/fresh-session config should still include the cloud
requirements
  
This can be tested with any config item, at this moment for oai staff
the easiest item to test is the `mentions_v2` feature. This is currently
enabled in cloud requirements, but is not enabled by default. As a
result, prior to these changes that feature is disabled after `/new` or
`/clear`. Testing the same steps with a binary from this branch should
not drop the feature enablement.
## Why

`log_remote_compact_failure` was the only consumer of the
compact-request logging payload and most of the token-usage breakdown
fields. Once that failure log is removed, keeping the surrounding
carrier types leaves dead plumbing in the compaction path and context
manager.

## What changed

- Remove `log_remote_compact_failure`, `CompactRequestLogData`, and the
v2 wrapper that only fed that log.
- Let both remote compaction implementations return the original
compaction error directly.
- Replace `TotalTokenUsageBreakdown` with a narrow helper that returns
only the remaining value needed by compaction analytics.
- Keep `estimate_response_item_model_visible_bytes` private to the
context manager implementation.

## Validation

- `cargo check -p codex-core`
- Code mode can now call standalone web search directly, including from nested JavaScript tool calls, and receive plaintext search results. (openai#26719)
- Tool and connector input schemas now preserve `oneOf` and `allOf`, and large schemas keep more shallow structure when compacted, improving compatibility with richer MCP tools. (openai#24118, openai#27084)
- `codex doctor` now includes editor and pager environment details in the local report while redacting raw values in JSON output. (openai#27081)
- Plugin marketplace automation is more informative and responsive: `codex plugin marketplace list --json` now includes each marketplace source, and plugin lists can return from the cached remote catalog before refreshing in the background. (openai#27009, openai#26932)

## Bug Fixes
- `codex resume --last "..."` and `codex fork --last "..."` now treat the trailing argument as the initial prompt instead of misreading it as a session ID. (openai#26818)
- MCP startup warnings from subagents now stay in the thread that owns them, avoiding duplicate parent-thread alerts and stuck startup spinners in the TUI. (openai#26639)
- Image edits now use the exact referenced image file paths instead of guessing from conversation history, so attached-image edits land on the intended input. (openai#26486)
- Bare URLs with `~` in the path are now linkified end to end in the TUI instead of being truncated before the tilde. (openai#27088)
- Thread resets such as `/new`, `/clear`, and `/fork` no longer drop cloud-managed requirements or feature flags during TUI config reloads. (openai#25177)
- Sandbox execution now preserves approved escalation decisions and enforces configured proxy-only networking more consistently. (openai#24981, openai#27035)

## Chores
- Release builds once again publish separate symbol archives with line tables, improving post-release crash symbolication without bringing back the earlier full-debug build slowdown. (openai#26202)
- The embedded V8 toolchain was updated to `rusty_v8` 149.2.0. (openai#26464)

## Changelog

Full Changelog: openai/codex@rust-v0.138.0...rust-v0.139.0

- openai#26741 fix(remote-control): preserve enrollment on generic websocket 404s @apanasenko-oai
- openai#26804 fix(core-plugins): send Codex product SKU to plugin-service @ericning-o
- openai#26464 build(v8): update rusty_v8 to 149.2.0 @cconger
- openai#26895 ci: use bazel environment for BuildBuddy secret @bolinfest
- openai#24981 fix: preserve approval sandbox decisions in unified exec @bolinfest
- openai#26818 fix(tui): accept prompts with resume and fork @fcoury-oai
- openai#24820 deps: update starlark to 0.14.2 @bolinfest
- openai#26639 fix(tui): scope MCP startup status by thread @fcoury-oai
- openai#26719 [codex] Enable standalone web search in code mode @rka-oai
- openai#26632 feat: add v2 agent residency lru @jif-oai
- openai#26974 Ignore proc-macro-error2 advisory @jif-oai
- openai#26969 feat: count V2 concurrency by active execution @jif-oai
- openai#26994 Rename multi-agent v2 close_agent to interrupt_agent @jif-oai
- openai#26997 Avoid reopening v2 descendants on resume @jif-oai
- openai#26821 [codex] Exclude external tool output from memories @rka-oai
- openai#26202 [codex] Restore release symbol artifacts with line tables @nornagon-openai
- openai#26852 fix(app-server): avoid blocking connection cleanup @apanasenko-oai
- openai#26923 Add HTTP window ID to Responses client metadata @ningyi-oai
- openai#26680 [codex-analytics] report compaction analytics details @rhan-oai
- openai#26637 [codex] Speed up external agent session imports @stefanstokic-oai
- openai#27009 [plugins] Expose marketplace source in marketplace list JSON @mpc-oai
- openai#27024 ci: template custom runner names by repo @bolinfest
- openai#26230 fix: preserve auto review across config and delegation @viyatb-oai
- openai#27038 [codex] Clarify PR babysitter state mutations @anp-oai
- openai#27037 [codex] Calm multi-agent v2 usage prompts @jif-oai
- openai#26687 Pair thread environment settings @pakrym-oai
- openai#27054 cli: add -P sandbox permissions profile alias @bolinfest
- openai#27035 Enforce configured network proxy in codex sandbox @viyatb-oai
- openai#26486 Route image edits through referenced file paths @won-openai
- openai#27060 [codex-analytics] stop sending codex error subreason @rhan-oai
- openai#27044 [codex] Require complete main-agent skill reads @fchen-oai
- openai#24118 feat: support oneOf and allOf in tool input schemas @celia-oai
- openai#26934 [codex] Prune stale curated plugin caches @xl-openai
- openai#26932 Use cached remote plugin catalog for plugin list @xl-openai
- openai#26091 [codex] Add OTEL counter descriptions @richardopenai
- openai#27081 feat(doctor): report editor and pager environment @fcoury-oai
- openai#27084 chore: preserve one more schema layer during large tool compaction @celia-oai
- openai#26840 Add typed file URIs @anp-oai
- openai#27088 fix(tui): linkify complete bare URLs with tildes @fcoury-oai
- openai#27068 Show effective sandbox modes in /debug-config @canvrno-oai
- openai#27092 Add extra config to StoredThread, leave empty for now @kumquatexpress
- openai#27096 Update web search citation prompt @yuning-oai
- openai#25177 Preserve cloud requirements across TUI thread resets @canvrno-oai
- openai#27106 [codex] Remove remote compaction failure log @pakrym-oai
@Zorlin Zorlin closed this Jul 2, 2026
@Zorlin Zorlin deleted the pal-commit branch July 2, 2026 19:40
@Zorlin Zorlin changed the title Merge latest changes from upstream and synchronise in new Rolodex fixes Streamline TUI project opening Jul 2, 2026
@Zorlin

Zorlin commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Landed into main via merge commit 78b166c after renaming the branch to tui-project-opening. GitHub refused the PR merge action after the branch rename, so main was updated directly.

Zorlin added a commit that referenced this pull request Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.