Skip to content

feat(windows): add temporary launch support for cc-switch start claude/codex#135

Open
AloneAtWar wants to merge 16 commits intoSaladDay:mainfrom
AloneAtWar:feat/windows-start-support
Open

feat(windows): add temporary launch support for cc-switch start claude/codex#135
AloneAtWar wants to merge 16 commits intoSaladDay:mainfrom
AloneAtWar:feat/windows-start-support

Conversation

@AloneAtWar
Copy link
Copy Markdown

@AloneAtWar AloneAtWar commented Apr 27, 2026

Closes #134

Summary

This PR adds full Windows support for cc-switch start claude and cc-switch start codex, implementing the complete lifecycle from process spawning through secure temp file creation to reliable child-process cleanup. It also hardens the cross-platform orphan scan so it can accurately distinguish living children from stale temp entries even under PID reuse.


What Changed

1. Core Windows Temp Launch Module (windows_temp_launch.rs)

Extracted duplicated Windows logic from both Claude and Codex temp launch paths into a shared module. Key responsibilities:

  • CreateProcessW suspended spawning: Spawns the target process in a suspended state so we can assign it to a Job Object before any code runs. Returns the process handle, thread handle, and PID.
  • Job Object lifecycle: Creates a Job Object with KILL_ON_JOB_CLOSE. If the child is successfully assigned, the OS guarantees termination when the launcher exits. If assignment fails with ERROR_ACCESS_DENIED (parent already in a nested job), we degrade gracefully with a visible warning and continue—relying on orphan scan for eventual cleanup.
  • Exit-code propagation: Waits on the process handle and returns the exit code to the caller so cc-switch start behaves like a transparent wrapper.
  • Console Ctrl Handler scoping: Temporarily disables Ctrl+C handling in the parent so the signal is forwarded to the child Job Object instead of killing the launcher.

2. .cmd/.bat Shim Handling

npm-installed CLIs (e.g. claude, codex) are .cmd shims that must be launched via cmd.exe /c. This path is inherently more complex than direct binary execution:

  • Application name selection: For .cmd/.bat targets, lpApplicationName must be NULL so CreateProcessW searches PATH; for direct .exe targets, we pass the resolved binary path.
  • cmd.exe resolution: To prevent hijacking, cmd.exe is resolved via GetSystemDirectoryW (trusted OS API) rather than which::which or ComSpec, both of which are influenceable by environment variables.
  • Command-line quoting: build_windows_command_line recognizes the cmd.exe /c prefix and applies cmd-specific quoting rules (e.g. doubling internal quotes). User native args are validated: trailing backslashes combined with quotes are rejected because they escape the closing quote in cmd.exe; plain trailing backslashes (e.g. C:\work\) are allowed when no cmd quoting is required.
  • Dangerous character rejection: % and ! are rejected entirely because cmd.exe expands them as environment variables and delayed expansion tokens, creating command-injection paths.

3. Security Hardening

  • Owner-only ACLs: create_secret_temp_file and create_secret_dir_with_acl use CreateFileW/CreateDirectoryW with an explicit SECURITY_DESCRIPTOR that sets a DACL granting GENERIC_ALL only to the current user. SE_DACL_PROTECTED is also set to block inheritable ACEs from the parent directory, eliminating a TOCTOU window.
  • Validation coverage: Both user-provided native args and internally-constructed paths (settings file, codex_home) are validated before reaching cmd.exe.

4. Orphan Scan Overhaul (orphan_scan.rs)

The orphan scan is responsible for cleaning up temp files/directories left behind by crashed or force-killed launches. Three major improvements:

  • Sidecar-based child tracking: When a child is spawned on Windows, a .child-meta sidecar file is written atomically (tmp+rename) containing {child_pid}:{creation_time_nanos}. The scanner now checks the sidecar first; if present, it verifies whether the child process is alive using OpenProcess + GetProcessTimes creation-time comparison. This fixes the nested-job fallback scenario where the launcher dies but the child survives—previously the scanner would see the dead launcher PID and delete the still-in-use CODEX_HOME.
  • Linux PID reuse detection: On Linux, kill(pid, 0) alone cannot distinguish a reused PID. We now read /proc/{pid}/stat field 22 (starttime) and /proc/stat btime to compute the absolute process start time in nanoseconds. If the on-disk start time is more than 2 seconds later than the file's recorded nanos, the PID is considered reused and the entry is cleaned.
  • Sidecar reap: A periodic pass removes .child-meta files whose main temp entry no longer exists, bounding long-term accumulation. .child-meta.tmp crash residuals are also cleaned.

5. Environment Block Handling

Windows CreateProcessW requires per-drive current-directory variables (=C:, =D:, etc.) to appear first in a custom environment block, followed by regular variables in alphabetical order. build_env_block_with_override now separates drive vars from regular vars, preserves drive vars in original order, sorts regular vars alphabetically, and concatenates them with correct double-null termination.

6. Filename Collision Avoidance

The original temp filename used process_creation_time as the timestamp, which is constant within a single cc-switch process. Same-provider launches therefore collided. An atomic 8-hex LAUNCH_SEQ counter is now inserted into the filename, producing unique paths for every launch while keeping the existing parser compatible.


Decisions

Decision Rationale
Suspended spawn + Job Object Resuming after Job assignment guarantees the child cannot escape cleanup by forking before joining the job.
Nested-job fallback continues with warning PowerShell ISE, Windows Terminal, and some CI runners already place the parent in a Job Object. Hard-failing here would break the feature for a large user segment. The sidecar mechanism makes this safe.
Sidecar stores creation time, not just PID Prevents PID reuse from causing false negatives (a reused PID with a later creation time is treated as a different process).
GetSystemDirectoryW for cmd.exe Eliminates PATH/ComSpec hijacking without adding runtime dependencies.
SE_DACL_PROTECTED on ACLs Blocks directory-inherited permissions that could otherwise create a TOCTOU read window on secret temp files.
Alphabetical env-block sorting Required by CreateProcessW docs; unsorted blocks work on modern Windows but are documented as unsupported.
Separate orphan_sidecars reap pass Keeps the main scan fast (only inspects primary entries) while still bounding sidecar accumulation.

Testing

  • 18 unit tests in orphan_scan covering filename parsing, sidecar logic, PID reuse detection, and atomic write behavior.
  • 12 Windows-only parity tests for cmd shim detection, quoting rules, and trailing-backslash validation.
  • Automated Windows smoke test covering suspended spawn, Job Object assignment, and exit-code propagation.
  • Manual QA script (scripts/windows-start-qa.ps1) covering M1–M5 scenarios: normal start, parent taskkill, orphaned file cleanup, Ctrl+C via .cmd shim, and nested Job Object fallback.
  • cargo test orphan_scan --lib, cargo test temp_launch --lib, and cargo test windows_smoke_test_spawn_job_wait_exit_code --lib all pass on Windows.

Verification Notes

  • Pre-existing unrelated test failures (proxy, TUI) exist on this branch and are not introduced by this PR.
  • No breaking changes to Unix paths; all Windows additions are behind #[cfg(windows)] or in new Windows-only modules.
  • Codex review rounds were performed on the branch with no remaining blocking findings.

概要

本 PR 为 cc-switch start claudecc-switch start codex 添加了完整的 Windows 支持,实现了从进程创建、安全临时文件管理到可靠子进程清理的完整生命周期。同时加固了跨平台的 orphan scan,使其即使在 PID 复用场景下也能准确区分存活子进程与已失效的临时条目。


改动总览

1. Windows 临时启动核心模块 (windows_temp_launch.rs)

将 Claude 和 Codex 临时启动路径中重复的 Windows 逻辑提取到共享模块。核心职责:

  • CreateProcessW 挂起创建:以挂起状态创建目标进程,确保在 ResumeThread 之前将其加入 Job Object,防止任何用户代码在缺乏保护的情况下运行。
  • Job Object 生命周期:创建带有 KILL_ON_JOB_CLOSE 的 Job Object。子进程成功加入后,操作系统保证启动器退出时自动终止子进程。若因父进程已在嵌套 Job 中导致 ERROR_ACCESS_DENIED,则优雅降级并给出可见警告,后续依赖 orphan scan 清理。
  • 退出码透传:等待进程句柄并将退出码返回给调用方,使 cc-switch start 表现为透明包装器。
  • 控制台 Ctrl Handler 作用域:临时禁用父进程的 Ctrl+C 处理,使信号转发到子 Job Object 而非杀死启动器。

2. .cmd/.bat Shim 处理

npm 安装的 CLI(如 claudecodex)是 .cmd shim,必须通过 cmd.exe /c 启动。此路径比直接执行二进制更复杂:

  • 应用程序名称选择:对于 .cmd/.bat 目标,lpApplicationName 必须为 NULL,使 CreateProcessW 搜索 PATH;对于直接 .exe 目标,则传入解析后的二进制路径。
  • cmd.exe 解析:为防止劫持,通过 GetSystemDirectoryW(受信任的操作系统 API)解析 cmd.exe,而非可通过环境变量影响的 which::whichComSpec
  • 命令行引号处理build_windows_command_line 识别 cmd.exe /c 前缀并应用 cmd 特定的引号规则(如内部引号加倍)。用户原生参数经过校验:带引号的尾部反斜杠会被拒绝,因为它们在 cmd.exe 中转义关闭引号;纯尾部反斜杠(如 C:\work\)在无 cmd 引号要求时允许通过。
  • 危险字符拒绝:完全拒绝 %!,因为 cmd.exe 将它们作为环境变量和延迟扩展令牌展开,形成命令注入路径。

3. 安全加固

  • 仅所有者的 ACLcreate_secret_temp_filecreate_secret_dir_with_acl 使用带有显式 SECURITY_DESCRIPTORCreateFileW/CreateDirectoryW,设置仅授予当前用户 GENERIC_ALL 的 DACL。同时设置 SE_DACL_PROTECTED 以阻止父目录的可继承 ACE,消除 TOCTOU 窗口。
  • 校验覆盖范围:用户提供的原生参数和内部构造的路径(设置文件、codex_home)在到达 cmd.exe 之前均经过校验。

4. Orphan Scan 重构 (orphan_scan.rs)

Orphan scan 负责清理崩溃或被强制杀死的启动所遗留的临时文件/目录。三大改进:

  • 基于 Sidecar 的子进程跟踪:Windows 上创建子进程时,以原子方式(临时文件+重命名)写入 .child-meta sidecar 文件,包含 {child_pid}:{creation_time_nanos}。扫描器优先检查 sidecar;若存在,则使用 OpenProcess + GetProcessTimes 创建时间对比来验证子进程是否存活。这修复了嵌套 Job 降级场景:启动器已死但子进程仍存活——之前的扫描器看到死亡的启动器 PID 就会删除仍在使用的 CODEX_HOME
  • Linux PID 复用检测:在 Linux 上,仅靠 kill(pid, 0) 无法区分复用的 PID。我们现在读取 /proc/{pid}/stat 第 22 字段(starttime)和 /proc/stat 的 btime,计算绝对进程启动时间(纳秒)。如果磁盘上的启动时间比文件记录的时间晚超过 2 秒,则认为 PID 已被复用并清理该条目。
  • Sidecar 回收:定期清理其主临时条目已不存在的 .child-meta 文件,限制长期累积。.child-meta.tmp 崩溃残留也会被清理。

5. 环境块处理

Windows CreateProcessW 要求自定义环境块中每个驱动器的当前目录变量(=C:=D: 等)必须排在最前面,随后是按字母顺序排列的普通变量。build_env_block_with_override 现在将驱动器变量与普通变量分离,保持驱动器变量原始顺序,按字母顺序排序普通变量,并以正确的双空字符终止拼接。

6. 文件名冲突避免

原始临时文件名使用 process_creation_time 作为时间戳,在同一 cc-switch 进程内保持不变,导致同一 provider 的多次启动文件名冲突。现在将原子 8 十六进制 LAUNCH_SEQ 计数器插入文件名,每次启动生成唯一路径,同时保持现有解析器兼容。


关键决策

决策 理由
挂起创建 + Job Object 在 Job 分配后再 Resume,保证子进程无法在加入 Job 之前通过 fork 逃脱清理。
嵌套 Job 降级继续并发出警告 PowerShell ISE、Windows Terminal 和部分 CI 运行器已将父进程放入 Job Object。在此处硬失败会使大量用户无法使用此功能。Sidecar 机制使这种降级变得安全。
Sidecar 存储创建时间而非仅 PID 防止 PID 复用导致假阴性(创建时间更晚的复用 PID 被视为不同进程)。
通过 GetSystemDirectoryW 解析 cmd.exe 消除 PATH/ComSpec 劫持风险,无需添加运行时依赖。
ACL 上设置 SE_DACL_PROTECTED 阻止目录继承权限,否则可能在 secret 临时文件上形成 TOCTOU 读取窗口。
环境块按字母顺序排序 CreateProcessW 文档要求;现代 Windows 上未排序的块也能工作,但文档明确说明不受支持。
独立的 orphan_sidecars 回收轮次 保持主扫描快速(仅检查主条目),同时仍限制 sidecar 累积。

测试

  • orphan_scan 中 18 个单元测试,覆盖文件名解析、sidecar 逻辑、PID 复用检测和原子写入行为。
  • 12 个 Windows 专用对等测试,覆盖 cmd shim 检测、引号规则和尾部反斜杠校验。
  • 自动化 Windows 冒烟测试,覆盖挂起创建、Job Object 分配和退出码透传。
  • 手动 QA 脚本(scripts/windows-start-qa.ps1)覆盖 M1–M5 场景:正常启动、父进程 taskkill、孤儿文件清理、通过 .cmd shim 的 Ctrl+C,以及嵌套 Job Object 降级。
  • cargo test orphan_scan --libcargo test temp_launch --libcargo test windows_smoke_test_spawn_job_wait_exit_code --lib 均在 Windows 上通过。

验证说明

  • 本分支上存在与本次 PR 无关的预先存在的测试失败(proxy、TUI 区域),并非由本次 PR 引入。
  • 对 Unix 路径无破坏性变更;所有 Windows 新增内容均位于 #[cfg(windows)] 之后或全新 Windows 专用模块中。
  • 分支上已执行多轮 Codex review,无剩余阻塞性问题。

AloneAtWar and others added 16 commits April 27, 2026 00:04
CreateProcessW does not search PATH when lpApplicationName is non-NULL,
so launching codex.cmd through a relative `cmd.exe` failed for
shim-installed CLIs. Mirror the Claude branch by passing NULL for the
application name on the .cmd/.bat path and only passing the resolved
binary path for the direct-binary case.
Previously every native arg ending with `\` was rejected on the cmd.exe /c
shim path, blocking benign Windows paths like `C:\work\` or
`--project-dir=C:\tmp\`. A trailing `\` only escapes a closing `"`, so the
hazard is real only when the arg also forces cmd quoting.

Extract `is_cmd_shim` (case-insensitive `.cmd`/`.bat`) and
`arg_requires_cmd_quote` helpers in both claude and codex temp_launch.rs.
Use the helper for application_name selection, and reject trailing `\`
only when an arg also requires cmd quoting.

Add 12 Windows-only parity tests covering helper behavior, the
plain-trailing-backslash accept path, the unsafe-quote+trailing-backslash
reject path, and direct-binary passthrough.
Drop the now-unused `pub(crate) fn build_command_windows` from
claude_temp_launch.rs. The live Windows path goes through
`build_windows_cmdline` + `is_cmd_shim`, so the case-sensitive
`ends_with(".cmd")` helper was just a parity hazard for future readers.

Gate `use std::time::{SystemTime, UNIX_EPOCH};` with `#[cfg(not(windows))]`
in both temp_launch.rs files. On Windows the timestamp comes from
`current_process_creation_time_nanos()`, so the imports were unused.
On Windows the temp filename used `process_creation_time` as the timestamp
component, which is constant for the same process. Same-provider launches
within one cc-switch process therefore stably collided on the temp file or
codex_home directory name.

Insert an 8-hex `LAUNCH_SEQ` atomic counter between provider and pid in the
filename / dirname:
  cc-switch-claude-{provider}-{seq}-{pid}-{timestamp}.json
  cc-switch-codex-{provider}-{seq}-{pid}-{timestamp}

Pid and timestamp remain the last two `-`-separated segments, so the
existing `orphan_scan::parse_cc_switch_name` parser keeps working without
changes.

Add tests:
- `write_temp_settings_file_uses_unique_filename_per_call` (claude) verifies
  two consecutive calls produce different paths.
- `parse_*_with_launch_seq_segment` (orphan_scan) verify the parser still
  extracts pid + nanos from the longer format.
If `Job::create_with_kill_on_close()` failed, the suspended child
process was leaking (handles + zombie suspended process). Mirror the
defensive cleanup pattern already used in claude_temp_launch.rs:
TerminateProcess + CloseHandle on both handles before returning the
error.
… path

When `ResumeThread` failed, `job.terminate()` was used to kill the
suspended child. But `try_assign` earlier only warns-and-continues on
failure; if the process never made it into the job,
`TerminateJobObject` would do nothing and the suspended child would
leak. Replace with explicit `TerminateProcess(h_process, 1)` so the
cleanup is correct in both single- and double-failure paths, matching
the pattern in codex_temp_launch.rs.

Also drop the now-unused `Job::terminate` helper.
Extract duplicated Windows logic from claude_temp_launch.rs and
codex_temp_launch.rs into a new windows_temp_launch.rs module:

- is_cmd_shim, arg_requires_cmd_quote
- quote_windows_arg, quote_windows_arg_for_cmd
- build_windows_command_line, build_env_block_with_override
- ScopedConsoleCtrlHandler, Job
- spawn_suspended_createprocessw, wait_for_child
- restrict_to_owner, create_secret_temp_file

Also add AppError constructors for Windows Job Object failures.

Add an automated Windows smoke test covering:
- spawn suspended child via CreateProcessW
- Job Object creation and assignment
- ResumeThread + wait + exit code propagation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Sort env block alphabetically in build_env_block_with_override
  per CreateProcessW docs requirement.

- Add validate_cmd_arg helper with visible stderr warnings for %/!
  and hard rejections for quotes and unsafe trailing backslashes.
  Validate both user native args and internally-constructed args
  (executable path, settings path, codex_home) in cmd shim mode.

- Extract shared run_suspended_child helper to eliminate drift
  between claude and codex Windows exec paths.

- Implement atomic file/dir creation with owner-only ACL via
  CreateFileW/CreateDirectoryW + SECURITY_DESCRIPTOR, eliminating
  the TOCTOU window identified by codex review.

- Add Win32_Storage_FileSystem feature to windows-sys for
  CreateFileW/CreateDirectoryW.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Make cmd.exe % and ! expansion hard errors instead of warnings.
  Adding CmdArgError::Percent and CmdArgError::Exclamation variants;
  validate_cmd_arg now rejects these characters to prevent real
  command-injection paths through cmd.exe /c (reproduced by codex).

- Set SE_DACL_PROTECTED on security descriptors created by
  create_secret_file_with_acl and create_secret_dir_with_acl.
  This blocks inheritable ACEs from the parent directory, eliminating
  the TOCTOU window where inherited permissions could read secret
  temp files before restrict_to_owner was called.

- Add automated test create_secret_file_with_acl_has_protected_dacl
  that reads the security descriptor back and verifies the DACL is
  protected and present.

- Add automated test validate_cmd_arg_rejects_percent_and_exclamation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous commit removed OpenOptions from the top-level imports,
breaking Unix compilation because create_secret_temp_file on Unix
still uses OpenOptions::new(). Gate the import behind #[cfg(unix)]
to avoid Windows unused-import warnings while keeping Unix builds
working.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When launching .cmd/.bat shims, both Claude and Codex wrappers were
passing unqualified 'cmd.exe' as lpApplicationName (or NULL), which
lets CreateProcessW search the current directory first. A rogue cmd.exe
in the workspace could be executed instead of the system binary.

Add resolve_system_cmd_exe() helper that uses which::which('cmd.exe')
with a ComSpec fallback, and pass the absolute path as
lpApplicationName while keeping 'cmd.exe' in the command line string
so build_windows_command_line still recognizes it for proper quoting.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
which::which and ComSpec are both environment-influenceable, so a
hijacked PATH or ComSpec could still redirect .cmd/.bat launches to a
rogue binary. Use GetSystemDirectoryW to ask the OS directly for the
system directory, then append cmd.exe. This is the trusted path.

Also avoid unconditionally resolving cmd.exe for direct .exe launches
in the Codex wrapper; only resolve it when is_cmd_shim is true.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… block

CreateProcessW docs require callers to explicitly preserve =X:
per-drive current-directory entries when supplying a custom env block.
Update build_env_block_with_override to separate drive vars from
regular vars, keep drive vars in original order, sort regular vars
alphabetically, and place drive vars first in the output block.

Add automated test verifying sorting, override replacement, and
double-null termination.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AssignProcessToJobObject can fail with ERROR_ACCESS_DENIED when the
parent is already inside a job that prohibits nesting. This is an
expected graceful degradation, but the previous code used log::warn!
which is invisible at the default error log level.

- Check the raw OS error code: ACCESS_DENIED → visible eprintln!
  warning so users know KILL_ON_JOB_CLOSE was lost.
- Any other error code → unexpected failure: terminate the child,
  clean up handles, and return a hard AppError.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Write .child-meta sidecar with actual child PID and creation time so
  orphan_scan judges by child alive state instead of launcher PID.
  This prevents nested-job fallback from deleting a still-running
  CODEX_HOME when the launcher dies first. [windows_temp_launch.rs]
- Add Linux /proc/{pid}/stat starttime validation to detect PID reuse
  in orphan_scan Unix branch. [orphan_scan.rs]
- Fix windows-start-qa.ps1 M2 to recursively detect descendants (e.g.
  node.exe from npm .cmd shims) via CIM instead of Get-Process -Name.
- Also reap orphaned .child-meta.tmp crash residuals.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Windows support for cc-switch start claude/codex temporary launch

1 participant