Skip to content

runtime: segmentation fault from vgetrandomPutState and runtime.growslice w/ runtime.OSLockThread [1.24 backport] #73144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gopherbot opened this issue Apr 3, 2025 · 6 comments
Labels
CherryPickApproved Used during the release process for point releases compiler/runtime Issues related to the Go compiler and/or runtime. release-blocker
Milestone

Comments

@gopherbot
Copy link
Contributor

@prattmic requested issue #73141 to be considered for backport to the next 1.24 minor release.

@gopherbot Please backport to 1.24. This is a regression that can cause arbitrary crashes in programs that exit goroutines under LockOSThread and are running on Linux 6.11 or higher.

@gopherbot gopherbot added the CherryPickCandidate Used during the release process for point releases label Apr 3, 2025
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 3, 2025
@gopherbot gopherbot added this to the Go1.24.3 milestone Apr 3, 2025
@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/662496 mentions this issue: [release-branch.go1.24] runtime: cleanup M vgetrandom state before dropping P

@prattmic
Copy link
Member

prattmic commented Apr 3, 2025

@mvdan suggests that we consider making a 1.24.3 release early for this issue:

I know that go1.24.2 was just released, but any chance a go1.24.3 release with this fix could be pushed forward? There are a number of Linux distributions and downstream projects which are currently stuck at Go 1.23 until a release with this fix happens for 1.24.

Let me add more context on the issue here:

Brief summary: Applications that have goroutines exit while holding LockOSThread may crash randomly in running on Linux 6.11+ (because that enables the new VDSO getrandom support added in 1.24).

We have reports that both Docker and Dagger suffer from these random crashes, which IIUC occur frequently enough to make those tools basically unusable with Go 1.24.

We don't have reports from other projects, but goroutines exiting with LockOSThread held is fairly common in "container" projects because they may modify some thread state in a way that the thread can no longer be reused, and this is the mechanism to drop a thread.

If I understand correctly, neither Docker or Dagger have made official releases targeting 1.24, both still target 1.23 minor releases. However, some Linux distributions built these projects with 1.24 for their package manager. It is releases from those package managers where users were encountering crashes.

cc @golang/release

@Foxboron
Copy link
Contributor

Foxboron commented Apr 3, 2025

fwiw, Arch Linux has already backported the patch and rebuilt docker and dagger.

https://gitlab.archlinux.org/archlinux/packaging/packages/go/-/commit/19e46549653b46dbb220c665c3ef545e7ef03132

@prattmic
Copy link
Member

prattmic commented Apr 3, 2025

k3s-io/k3s#11973 (comment) reports the same crash in containerd as well. (I'm not sure what's going on with the rest of that issue, which is marked as fixed. The crash report may be unrelated to the rest of the issue.)

@dmitshur dmitshur changed the title runtime: Segmentation fault from vgetrandomPutState and runtime.growslice w/ runtime.OSLockThread [1.24 backport] runtime: segmentation fault from vgetrandomPutState and runtime.growslice w/ runtime.OSLockThread [1.24 backport] Apr 3, 2025
@csutcliff
Copy link

csutcliff commented Apr 6, 2025

This is affecting debian trixie and sid (docker specifically) so another +1 for a release with the fix.

@prattmic prattmic added the CherryPickApproved Used during the release process for point releases label Apr 16, 2025
@gopherbot gopherbot removed the CherryPickCandidate Used during the release process for point releases label Apr 16, 2025
gopherbot pushed a commit that referenced this issue Apr 28, 2025
…opping P

When an M is destroyed, we put its vgetrandom state back on the shared
list for another M to reuse. This list is simply a slice, so appending
to the slice may allocate. Currently this operation is performed in
mdestroy, after the P is released, meaning allocation is not allowed.

More the cleanup earlier in mdestroy when allocation is still OK.

Also add //go:nowritebarrierrec to mdestroy since it runs without a P,
which would have caught this bug.

Fixes #73144.
For #73141.

Change-Id: I6a6a636c3fbf5c6eec09d07a260e39dbb4d2db12
Reviewed-on: https://go-review.googlesource.com/c/go/+/662455
Reviewed-by: Jason Donenfeld <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
(cherry picked from commit 0b31e6d)
Reviewed-on: https://go-review.googlesource.com/c/go/+/662496
@gopherbot
Copy link
Contributor Author

Closed by merging CL 662496 (commit 0ab64e2) to release-branch.go1.24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CherryPickApproved Used during the release process for point releases compiler/runtime Issues related to the Go compiler and/or runtime. release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants