Skip to content

runtime: inlining in bgsweep leads to livelock #73499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ArsenySamoylov opened this issue Apr 25, 2025 · 7 comments
Open

runtime: inlining in bgsweep leads to livelock #73499

ArsenySamoylov opened this issue Apr 25, 2025 · 7 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime.

Comments

@ArsenySamoylov
Copy link

Go version

go version devel go1.24-3cdf88bffa Thu Nov 7 21:34:06 2024 +0300 linux/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/asamoylov/.cache/go-build'
GOENV='/home/asamoylov/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/asamoylov/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/asamoylov/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/asamoylov/Go/'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/asamoylov/Go/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='devel go1.24-3cdf88bffa Thu Nov 7 21:34:06 2024 +0300'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/asamoylov/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build383434801=/tmp/go-build -gno-record-gcc-switches'

What did you do?

When running go1 BinaryTree17 benchmark with PGO using CL#626295 cmd/compile: support multiple levels of inlining using PGO (see TODO in src/cmd/compile/internal/inline/inl.go:892). I encountered deadlock. Here is my original post about this problem issues#69046

Reproducing bug

Remove TODO part from https://go.dev/cl/626295. Complie https://go.dev/play/p/f7HGymVik58 with PGO:

go test -c -o test.out binary_tree_test.go 
taskset -c 0-3 ./test.out -test.bench=. -test.count=50 -test.cpuprofile=test.pprof
# test.count must be high enoug to collect enoug info

go test -c -pgo=test.pprof -o test-pgo.out binary_tree_test.go
taskset -c 0-3 ./test-pgo.out -test.bench=. -test.count=5 # this command freezes around 3rd iteration

What did you see happen?

Deadlock

What did you expect to see?

Happy end, not a deadlock.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 25, 2025
@ArsenySamoylov
Copy link
Author

Root cause of bug

The deadlock occurs because another goroutine is constantly trying to stop the mark phase (while holding stw lock). Here is part of its back-trace:

1  0x0000000000083d58 in runtime.preemptall                                                                                                                                                                                                                               
    at /home/asamoylov/CoreGo/src/runtime/proc.go:6303                                                                                                                                                                                                                     
...                                                                                                                                                                                                                
 6  0x000000000002e3e4 in runtime.forEachP                                                                                                                                                                                                                                 
    at /home/asamoylov/CoreGo/src/runtime/proc.go:2013                                                                                                                                                                                                                     
 7  0x000000000002e3e4 in runtime.gcMarkTermination                                                                                                                                                                                                                        
    at /home/asamoylov/CoreGo/src/runtime/mgc.go:1153    

The function gcMarkTermination() ensures that all caches are flushed before continuing the GC cycle. It calls forEachP() to flush caches for each processor (P).

// World must be stopped and mark assists and background workers must be
// disabled.
func gcMarkTermination(stw worldStop) {
 
...
    // Ensure all mcaches are flushed. Each P will flush its own
    // mcache before allocating, but idle Ps may not. Since this
    // is necessary to sweep all spans, we need to ensure all
    // mcaches are flushed before we start the next GC cycle.
    //
    // While we're here, flush the page cache for idle Ps to avoid
    // having pages get stuck on them. These pages are hidden from
    // the scavenger, so in small idle heaps a significant amount
    // of additional memory might be held onto.
    //
    // Also, flush the pinner cache, to avoid leaking that memory
    // indefinitely.
    forEachP(waitReasonFlushProcCaches, func(pp *p) {
 
...

However, the main issue is that forEachP() calls fn(p) for each P only when it is at a GC-safe point. In order to do that, it preempts all Ps (thus call to preemptall() in back trace).

// forEachP calls fn(p) for every P p when p reaches a GC safe point.
// If a P is currently executing code, this will bring the P to a GC
// safe point and execute fn on that P. If the P is not executing code
// (it is idle or in a syscall), this will call fn(p) directly while
// preventing the P from exiting its state. This does not ensure that
// fn will run on every CPU executing Go code, but it acts as a global
// memory barrier. GC uses this as a "ragged barrier."
//
// The caller must hold worldsema. fn must not refer to any
// part of the current goroutine's stack, since the GC may move it.
func forEachP(reason waitReason, fn func(*p)) {

Goroutines are not preemtible when it holds any locks.

And sweepone(), freeSomeWbufs() take locks. So when they are inlined the bgsweep goroutine becomes non-preemtible for the most of the time! Leading to livelock.

@ArsenySamoylov
Copy link
Author

ArsenySamoylov commented Apr 25, 2025

I also have few solutions for this bug. I posted them in my original comment. Furthermore, I will also provide them as separate CL's a little bit later.

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Apr 25, 2025
@JunyangShao
Copy link
Contributor

Thank you for opening the issue, did you encounter the problem only on a pending CL? I saw the go version is devel go1.24-3cdf88bffa. If so please consider respond in the original issue #69046 . Thanks. 😃

@ArsenySamoylov
Copy link
Author

ArsenySamoylov commented Apr 26, 2025

did you encounter the problem only on a pending CL?

Yes, as of now this doesn't happen in release versions.

consider respond in the original issue #69046

I initially posted this in original issue #69046. But I decided that this problem should be in separate issues, because this is a subtle problem of inlining in runtime (and can come up any time inlining or something related changes). If you think that it is best to discus it in the context of the original issue, then let me know =).

@prattmic prattmic reopened this Apr 28, 2025
@prattmic
Copy link
Member

I think this is fine to discuss in this new issue, this problem is a bit separate from #69046. (Though either place is fine TBH).

I think your first solution (adding an additional goschedIfBusy call) seems reasonable, but I'd like to hear @mknyszek's opinion. Feel free to send a CL and we can discuss there as well.

@mknyszek
Copy link
Contributor

Thanks for filing an issue and for the thorough analysis.

Either solution you described would work, I think. bgsweep is really a best-effort goroutine that exists as an optimization, to reduce latency on memory allocation paths. It is not strictly necessary for the sweeper to make progress, since the allocator is designed to guarantee all sweeping is done before the next mark phase has to start.

However, I agree with @prattmic and prefer the explicit preemption point. It's a bit simpler since we're treating the background sweeper more like a regular goroutine, which is easier to reason about. Every time we add a goparkunlock we're making one of these system goroutines a little more special... haha.

Another alternative would be to have unlock check to see if the preemption flag on the g is set, and if there are no other locks held, yield to the scheduler. This seems like it would solve the problem at a much more general level, and in the long term. However, that's a little scary to do this late in the release cycle, though. There may be places where we're implicitly relying on unlock not having any preemption points. It would be good to flush those out, but the code freeze is in ~4 weeks and such a change will need time to soak.

@ArsenySamoylov would you like to send a change? You can follow the instructions at https://go.dev/doc/contribute, or if you prefer I can work on sending a patch for this. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

No branches or pull requests

6 participants