Skip to content

fix: prevent indefinite hang on Windows when child processes hold pipes open#273

Merged
umputun merged 1 commit intoumputun:masterfrom
stanurkov:windows-timeout
Apr 8, 2026
Merged

fix: prevent indefinite hang on Windows when child processes hold pipes open#273
umputun merged 1 commit intoumputun:masterfrom
stanurkov:windows-timeout

Conversation

@stanurkov
Copy link
Copy Markdown
Contributor

Summary

  • Fix readLines() to run ReadString in a goroutine with select on context, so cancellation (idle/session timeout) can interrupt a blocked pipe read
  • On Windows, Claude CLI child processes (Node.js, MCP servers) inherit stdout/stderr handles and keep pipes open after the parent exits, causing ReadString to block indefinitely
  • Without this fix, the only way to unblock is manually killing processes via Task Manager

Root cause

On Unix, killProcess sends SIGKILL to the entire process group, closing all inherited handles. On Windows, process.Kill() only terminates the direct process — child processes survive, pipes stay open, ReadString never returns, and ctx.Done() was never checked during the blocking read.

What changed

readLines() in pkg/executor/linereader.go:

Before: synchronous ReadString in a loop with select { case <-ctx.Done() / default } — the default branch called ReadString which blocked, making context cancellation unreachable.

After: ReadString runs in a goroutine, result arrives via buffered channel. select listens on both the channel and ctx.Done(), so cancellation takes effect immediately even while the read is blocked.

Recommendation for Windows users

Set idle_timeout = 5m in config (~/.config/ralphex/config). Without a timeout, no one cancels the context, and the pipe hangs forever. 5 minutes of silence is never normal during active task execution.

Long-term fix: Job Objects (CreateJobObject + TerminateJobObject) in procgroup_windows.go to kill the entire process tree, making pipes close immediately without relying on timeouts.

Test plan

  • go test ./pkg/executor/ passes
  • go build ./... compiles
  • Verified on Windows 10: idle timeout fires after silence, Kill() succeeds, Wait() returns, task loop continues to next task

Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx. clean fix, correct approach - goroutine + buffered channel is the standard Go idiom for making a blocking call context-cancellable. tests pass, linter clean, race detector clean.

@umputun umputun merged commit be6a4de into umputun:master Apr 8, 2026
4 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a Windows-specific hang where readLines() could block indefinitely in ReadString when child processes keep stdout/stderr pipe handles open, preventing context cancellation (idle/session timeouts) from interrupting a blocked read.

Changes:

  • Refactors readLines() to perform bufio.Reader.ReadString('\n') in a goroutine and select between the read result and ctx.Done().
  • Updates inline documentation explaining why async reads are required (notably on Windows with inherited pipe handles).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 26 to +35
reader := bufio.NewReader(r)
ch := make(chan readResult, 1) // buffered: lets abandoned goroutine exit after kill

doRead := func() {
line, err := reader.ReadString('\n')
ch <- readResult{line, err}
}

go doRead()

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readLines() now always starts a ReadString goroutine before checking ctx. If ctx is already canceled, there’s a race where the read goroutine can fill ch and the select may take the res := <-ch branch, potentially draining input and returning nil/EOF instead of context.Canceled (can make cancellation propagation and tests flaky). Consider checking ctx.Err() up front (before starting the first goroutine) and/or prioritizing cancellation inside the loop (e.g., if ctx.Err()!=nil when receiving res, return the context error without processing/spawning another read).

Copilot uses AI. Check for mistakes.
Comment on lines +49 to 53
}
return fmt.Errorf("read lines: %w", res.err)
}
return fmt.Errorf("read lines: %w", err)
go doRead()
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation spawns a new goroutine for every line (go doRead() on each successful read). For long-running/high-volume streams this adds avoidable goroutine creation/teardown overhead. A lower-overhead approach is to run a single long-lived reader goroutine that loops on ReadString and sends readResults on a channel, while the main goroutine selects on ctx.Done() vs results.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants