fix(server): disable SO_REUSEADDR to prevent silent port sharing (#3289)#3446
fix(server): disable SO_REUSEADDR to prevent silent port sharing (#3289)#3446rodboev wants to merge 3 commits into
Conversation
|
Holding this for a redesign — putting it on A release-gate review (Codex) flagged that The goal (stop two instances silently sharing port 8787 on Windows/macOS) is real and worth fixing, but globally disabling
That preserves the crash/restart recovery path while still catching the genuine double-start case. Happy to look again once it's reworked along those lines. |
…abling SO_REUSEADDR (nesquena#3289)
|
Verified the restart regression. Traced it through Reworked in 9dbe92f along the lines you suggested. Three changes instead of the one-liner:
Also added Tests in |
|
Shipped in v0.51.234 (Release HB) via release PR #3488 — thank you @rodboev! 🙏 Unheld and merged this round. This PR was held earlier in the sweep because its original form (globally disabling
Both pre-release reviewers (Codex regression + Opus correctness/security) confirmed the probe doesn't false-positive on a legitimate restart and the Windows path is correct. Authorship preserved via Closing as merged-via-release-stage (the change is in |
## Release v0.51.236 — Release HD (stage-q7) First Phase 3 (deep-review) release — picked by the 3-factor framework (contributor × impact × mitigated-risk): high-impact (#1952 native Windows support), backend-only (no screenshots), well-mitigated risk (POSIX path provably unchanged), from a contributor active this session (@rodboev, #3446/#3486 shipped earlier today). ### Added | PR | Author | Fix | |----|--------|-----| | #1952 | @rodboev | Native Windows support for `bootstrap.py` + the embedded terminal: POSIX-only `fcntl`/`termios`/`select` guarded behind `_TERMINAL_SUPPORTED`; terminal entry points raise `NotImplementedError`/no-op on Windows; bootstrap Windows block → warning; auto-install errors clearly on native Windows (WSL unaffected); foreground uses `Popen`+exit on Windows instead of `os.execv`. **POSIX behavior unchanged on every path.** | ### Absorbed on the way in (fix-it-ourselves, reviewed fresh) - `subprocess.CREATE_NEW_PROCESS_GROUP` → `getattr(subprocess, ..., 0)` — the constant is Windows-only, so a win32-simulating test `AttributeError`'d on Linux. Mirrors the `SO_EXCLUSIVEADDRUSE` getattr guard. - Fixed 2 over-reaching tests in `test_windows_native_support.py` — one was launching a **real installer subprocess** via an unstubbed `subprocess.run` (now stubbed; harness 2.8s vs 80s); removed unused imports. - Updated `test_onboarding_static.py` — it asserted the OLD "Native Windows is not supported" hard-block string this PR intentionally replaces; now asserts the new experimental-warning + auto-install guard. - Help-text accuracy: `--foreground` help now describes the Windows Popen path (Opus nit). ### Gate results - **Full pytest suite**: 7478 passed, 9 skipped, 3 xpassed, **0 failed** - **ruff forward gate**: CLEAN - **browser-smoke gate**: CLEAN (gate hardened mid-release to auto-detect the cached chromium revision) - **Codex (regression)**: SAFE TO SHIP (simulated `sys.platform=win32`, verified POSIX modules not imported + all terminal guards complete + POSIX foreground still uses execv) - **Opus (correctness)**: SAFE TO SHIP (POSIX path provably unchanged, all fcntl/termios/select guarded, Popen+exit correct; noted inherent-Windows trade-offs that aren't PR bugs) Note: the Windows *runtime* path can't be executed on the Linux CI box; it was reviewed statically by both reviewers + the contributor's 209-line test (win32 simulated via monkeypatch). Linux/POSIX no-regression is fully verified. Closes #1952. Co-authored-by: rodboev <rodboev@users.noreply.github.com>
Thinking Path
SO_REUSEADDRallows multiple processes to bind the same port simultaneously; requests route unpredictably between them. On macOS, Bug: duplicate local WebUI start can fight launchd-managed 8787 instance #3289 documented the same symptom between a manually started instance and a launchd-managed one.allow_reuse_address = False, catches the double-bind but breaks legitimate fast restart: after any accepted connection, the socket enters TIME_WAIT for ~60s on Linux/macOS, soctl.sh restartand theos.execvself-update path inapi/updates.pyboth fail with EADDRINUSE.GET /health), whileSO_REUSEADDRstays enabled for TIME_WAIT rebind. On Windows specifically,SO_EXCLUSIVEADDRUSEreplaces theSO_REUSEADDRsuppression to prevent port hijacking at the kernel level.serve_forever(), accepts TCP connections via kernel backlog but never processes HTTP requests. The probe times out and startup proceeds. That's what makes it safe forctl.sh restartduring the brief overlap between stop and start.httpd.server_close()to the shutdownfinallyblock so the listening socket is released immediately rather than lingering through cleanup.What Changed
server.py: Removedallow_reuse_address = False. Added_abort_if_already_serving(host, port), a pre-bind probe that connects to the port, sendsGET /health, and aborts startup only if a live HTTP response comes back. Addedserver_bind()override onQuietHTTPServerthat setsSO_EXCLUSIVEADDRUSEon Windows. Addedhttpd.server_close()to the shutdown path.tests/test_server_port_exclusivity.py(new, replacestest_server_no_reuse_address.py): Probe detection (live server → SystemExit), probe passthrough (free port, unresponsive socket, wildcard host normalization), andSO_EXCLUSIVEADDRUSEassertion on Windows.Why It Matters
A second webui instance silently sharing the same port causes unpredictable request routing, split session state, and confusing behavior that is difficult to diagnose. The fix detects live instances before bind without breaking the fast-restart paths that
ctl.shand the self-update mechanism depend on.Verification
python -m pytest tests/test_server_port_exclusivity.py -v --timeout=60[!!] FATAL: Another server is already responding on ...and exits.ctl.sh restartcompletes without EADDRINUSE on Linux/macOS.Risks / Follow-ups
ctl.sh restartoverlap, and the 2s is concurrent with the old instance's shutdown cleanup.ctl.shalready has its own pidfile-based guard (_current_pidinstart_cmd). The Python-level probe catches cases that bypassctl.sh: directpython server.pyinvocations, Windows Task Scheduler, and theos.execvself-update path.SO_EXCLUSIVEADDRUSEis Windows-only. On Linux/macOS the probe is the sole guard, which is sufficient since POSIXSO_REUSEADDRonly affects TIME_WAIT, not active-listener sharing.Model Used
Claude Opus 4.6 via Claude Code CLI