Skip to content

step ssh proxycommand hangs ~60s when server closes connection before stdin closes #1641

Description

@techwolf359

Description

step ssh proxycommand hangs for approximately 60 seconds when the SSH server closes the connection before the client has closed stdin. This occurs when the server rejects the connection mid-session (e.g. PAM account check failure after certificate authentication succeeds), sending a SSH2_MSG_USERAUTH_BANNER followed by a disconnect.

The process eventually exits due to an OS-level timeout, but the hang makes it appear to the user that the connection is still in progress.

Suspected Cause

The deadlock is in proxyDirect / proxyDirectWithIO in command/ssh/proxycommand.go:

var wg sync.WaitGroup
wg.Add(1)
go func() {
    io.Copy(conn, os.Stdin)   // goroutine 1: blocks reading stdin
    conn.CloseWrite()
    wg.Done()
}()
wg.Add(1)
go func() {
    io.Copy(os.Stdout, conn)  // goroutine 2: exits when server closes
    conn.CloseRead()
    wg.Done()
}()
wg.Wait()                     // waits for both — never returns

When the server closes the TCP connection:

  1. Goroutine 2 exits and calls conn.CloseRead()
  2. Goroutine 1 is blocked reading from os.Stdin
  3. os.Stdin is a pipe from the SSH client process, which hasn't closed because it's waiting for the ProxyCommand to exit
  4. The ProxyCommand is waiting for both goroutines — deadlock

Calling os.Stdin.Close() from goroutine 2 does not reliably interrupt a blocked read() syscall on macOS when stdin is a pipe.

Reproduction

// Start a TCP server that sends data and immediately closes
ln, _ := net.Listen("tcp", "127.0.0.1:0")
go func() {
    conn, _ := ln.Accept()
    conn.Write([]byte("hello"))
    conn.Close()
}()

// Simulate a stdin that never closes (SSH client waiting for ProxyCommand)
stdinR, _ := io.Pipe()  // write end intentionally left open

// This hangs indefinitely
proxyDirectWithIO("127.0.0.1", port, stdinR, io.Discard)

Suggested Fix

In command/ssh/proxycommand.go, I think we can return early when either goroutine completes rather than waiting for both. Once one side closes, the ProxyCommand has nothing more to do — returning allows the OS to reclaim the blocked goroutine.

Test

A regression test could simulate a TCP server that sends data and immediately closes, while holding stdin open indefinitely. Before the fix, proxyDirectWithIO would hang; after, it should return promptly.

Environment

  • macOS arm64 (Apple Silicon)
  • step installed via Homebrew
  • OpenSSH 9.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions