Skip to content

net/http: http2 client data race to close request body when server crashes unexpectedly #73522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jhump opened this issue Apr 28, 2025 · 2 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jhump
Copy link

jhump commented Apr 28, 2025

Go version

go version go1.24.2 linux/arm64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3981435804=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.24.2'
GOWORK=''
PKG_CONFIG='pkg-config'
  • The above is from the same Docker image used to build the Go application in question:
    docker.io/library/golang:1.24.2-bookworm
    However, the above was run from a Macbook Pro. The actual failure occurred in a CI environment.
    So the actual GOARCH was most likely amd64, not arm64.

What did you do?

Running Antithesis tests of an application that uses connect-go, and net/http, for remote procedure calls. Right before the data race, the server was killed via Antithesis fault injection.

What did you see happen?

The client then observed the following race:

WARNING: DATA RACE
Write at 0x00c001ef3859 by goroutine 1672:
  net/http.(*readTrackingBody).Close()
      net/http/transport.go:765 +0x2c
  net/http.(*http2clientStream).closeReqBodyLocked.func1()
      net/http/h2_bundle.go:7959 +0xe1

Previous write at 0x00c001ef3859 by goroutine 1152:
  net/http.(*readTrackingBody).Close()
      net/http/transport.go:765 +0x2c
  net/http.(*Request).closeBody()
      net/http/request.go:1531 +0x1b8f
  net/http.(*Transport).roundTrip()
      net/http/transport.go:729 +0x1b50
  net/http.(*Transport).RoundTrip()
      net/http/roundtrip.go:30 +0x33
  net/http.send()
      net/http/client.go:259 +0x8ca
  net/http.(*Client).send()
      net/http/client.go:180 +0x14c
  net/http.(*Client).do()
      net/http/client.go:728 +0x1338
  net/http.(*Client).Do()
      net/http/client.go:587 +0x33
  connectrpc.com/connect.(*duplexHTTPCall).makeRequest()
      connectrpc.com/[email protected]/duplex_http_call.go:303 +0x2a9
  connectrpc.com/connect.(*duplexHTTPCall).sendUnary()
      connectrpc.com/[email protected]/duplex_http_call.go:152 +0x31d
  connectrpc.com/connect.(*duplexHTTPCall).Send()
      connectrpc.com/[email protected]/duplex_http_call.go:96 +0x4bd
  connectrpc.com/connect.(*connectUnaryMarshaler).write()
      connectrpc.com/[email protected]/protocol_connect.go:967 +0x136
  connectrpc.com/connect.(*connectUnaryMarshaler).Marshal()
      connectrpc.com/[email protected]/protocol_connect.go:950 +0x5a4
  connectrpc.com/connect.(*connectUnaryRequestMarshaler).Marshal()
      connectrpc.com/[email protected]/protocol_connect.go:996 +0x1ee
  connectrpc.com/connect.(*connectUnaryClientConn).Send()
      connectrpc.com/[email protected]/protocol_connect.go:464 +0x44
  connectrpc.com/connect.(*errorTranslatingClientConn).Send()
      connectrpc.com/[email protected]/protocol.go:206 +0x5e
  connectrpc.com/connect.NewClient[go.shape.d50c654e04ee60d60cbac2b6a494762c133ec3dcc72739a3c95e8a0c727922f3,go.shape.7cad5bf851e05d86f03a76f6be4bb4a46eee5846774e6d79d0b3be000fb312d7].func1()
      connectrpc.com/[email protected]/client.go:86 +0x25a
  connectrpc.com/otelconnect.(*Interceptor).WrapUnary.func1()
      connectrpc.com/[email protected]/interceptor.go:153 +0x1901
  connectrpc.com/connect.NewClient[go.shape.d50c654e04ee60d60cbac2b6a494762c133ec3dcc72739a3c95e8a0c727922f3,go.shape.7cad5bf851e05d86f03a76f6be4bb4a46eee5846774e6d79d0b3be000fb312d7].func2()
      connectrpc.com/[email protected]/client.go:112 +0x302
  connectrpc.com/connect.(*Client[go.shape.d50c654e04ee60d60cbac2b6a494762c133ec3dcc72739a3c95e8a0c727922f3,go.shape.7cad5bf851e05d86f03a76f6be4bb4a46eee5846774e6d79d0b3be000fb312d7]).CallUnary()
      connectrpc.com/[email protected]/client.go:130 +0xb1
  // additional generated stub and application code stack frames elided

Goroutine 1672 (running) created at:
  net/http.(*http2clientStream).closeReqBodyLocked()
      net/http/h2_bundle.go:7957 +0x164
  net/http.(*http2clientStream).abortStreamLocked()
      net/http/h2_bundle.go:7932 +0xbd
  net/http.(*http2clientConnReadLoop).cleanup()
      net/http/h2_bundle.go:9890 +0x926
  net/http.(*http2ClientConn).readLoop.deferwrap1()
      net/http/h2_bundle.go:9811 +0x33
  runtime.deferreturn()
      runtime/panic.go:610 +0x5d
  net/http.(*http2Transport).newClientConn.gowrap1()
      net/http/h2_bundle.go:8334 +0x33

So the main "round trip" logic goroutine tries to close the request body, but so does the background goroutine that reads from the underlying net.Conn. The background goroutine tries to abort all in-progress operations and tries to close the request body, too, but there appears to be no synchronization. The background goroutine holds a mutex (http2Transport.mu), but that doesn't guard the request body.

What did you expect to see?

The HTTP operation was expected to fail due to the fault, but not in a way that tickles the Go race detector.

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Apr 28, 2025
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 30, 2025
@cagedmantis cagedmantis added this to the Backlog milestone Apr 30, 2025
@cagedmantis
Copy link
Contributor

cc @neild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants