Skip to content

runtime: crash in runtime.(*unwinder).next #73413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sirzooro opened this issue Apr 17, 2025 · 5 comments
Closed

runtime: crash in runtime.(*unwinder).next #73413

sirzooro opened this issue Apr 17, 2025 · 5 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@sirzooro
Copy link

Go version

1.24.2 linux/arm64

Output of go env in your module/workspace:

-

What did you do?

We have set of fuzz test which are executed every day. Everything was fine when we were on go 1.23.x. We upgraded to 1.24.0 shortly after it was released and after this we started seeing random crashes in some of our fuzz tests (now we are on 1.24.2). These crashes are not related to input data, attempt to manually feed data printed by fuzzer to fuzz function did not reproduce crash. Call stack printed by fuzzer was useless, so recently we added gdb to get better one. Here is what we got from bt full:

Thread 3.1 "go-fuzz-pion-Fu" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff79ce780 (LWP 2527)]
0x0000555555660525 in runtime.(*unwinder).next (u=0x7fffffffd0e8) at runtime/traceback.go:458
458	runtime/traceback.go: No such file or directory.
#0  0x0000555555660525 in runtime.(*unwinder).next (u=0x7fffffffd0e8)
    at runtime/traceback.go:458
        doPrint = true
        gp = 0xc000184540
        ~r0.ptr = <optimized out>
        ~r0.len = <optimized out>
#1  0x0000555555626fa9 in runtime.scanstack (gp=0xc000184540, 
    gcw=0xc00003b750, ~r0=<optimized out>) at runtime/mgcmark.go:904
        sp = <optimized out>
        scannedSize = <optimized out>
        p = <optimized out>
        state = <optimized out>
        u = <optimized out>
#2  0x00005555556259f1 in runtime.markroot.func1 () at runtime/mgcmark.go:240
        gp = 0xc000184540
        &workDone = 0x7fffffffd250
        gcw = 0xc00003b750
        userG = 0xc0000028c0
        selfScan = false
#3  0x0000555555625699 in runtime.markroot (gcw=0xc00003b750, i=27, 
    flushBgCredit=false, ~r0=<optimized out>) at runtime/mgcmark.go:214
        status = <optimized out>
        gp = 0xc000184501
        workCounter = <optimized out>
        ~r0.ptr = <optimized out>
        ~r0.ptr = <optimized out>
        workDone = <optimized out>
        ~r0.len = <optimized out>
        ~r0.len = <optimized out>
#4  0x0000555555627bfd in runtime.gcDrainN (gcw=0xc00003b750, scanWork=65536, 
    ~r0=<optimized out>) at runtime/mgcmark.go:1309
        b = <optimized out>
        workFlushed = 58658
        gp = 0xc0000028c0
#5  0x000055555562676c in runtime.gcAssistAlloc1 (gp=0xc0000028c0, 
    scanWork=65536) at runtime/mgcmark.go:670
        startTime = 25047356213292549
        decnwait = <optimized out>
        gcw = 0x7fffffffd0e8
        workDone = <optimized out>
        incnwait = <optimized out>
        now = <optimized out>
        duration = <optimized out>
        pp = <optimized out>
#6  0x00005555556265db in runtime.gcAssistAlloc.func2 ()
    at runtime/mgcmark.go:561
        gp = 0x7fffffffd0e8
        scanWork = 0
#7  0x0000555555675b07 in runtime.systemstack () at runtime/asm_amd64.s:514
No locals.
#8  0x0000007000000101 in ?? ()
No symbol table info available.
#9  0x000000c00005e008 in ?? ()
No symbol table info available.
#10 0x00007fffffffd3d8 in ?? ()
No symbol table info available.
#11 0x0000555555681581 in crosscall2 () at runtime/cgo/asm_amd64.s:43
No locals.
#12 0x0000555555606046 in LLVMFuzzerTestOneInput (data=1, size=0)
    at _cgo_export.c:60
        _cgo_ctxt = 140737488343200
        _cgo_zero = {p0 = 0, p1 = 0, r0 = 0}
        _cgo_a = {p0 = 93825007067872, p1 = 4101, r0 = 0}
#13 0x00005555555c36a4 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) ()
No symbol table info available.
#14 0x00005555555c2dfa in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) ()
No symbol table info available.
#15 0x00005555555c45ea in fuzzer::Fuzzer::MutateAndTestOne() ()
No symbol table info available.
#16 0x00005555555c5166 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile> >&) ()
No symbol table info available.
#17 0x00005555555b32a3 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) ()
No symbol table info available.
#18 0x00005555555dcf93 in main ()
No symbol table info available.
#19 0x00007ffff79fcd90 in __libc_start_call_main (
    main=main@entry=0x5555555dcf70 <main>, argc=argc@entry=6, 
    argv=argv@entry=0x7fffffffdd88)
    at ../sysdeps/nptl/libc_start_call_main.h:58
        self = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -8689054215892993091, 
                140737488346504, 93824992792432, 93824994788600, 
                140737354125376, 8689054214788905917, 8689035798480321469}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#20 0x00007ffff79fce40 in __libc_start_main_impl (main=0x5555555dcf70 <main>, 
    argc=6, argv=0x7fffffffdd88, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffdd78)
    at ../csu/libc-start.c:392
No locals.
#21 0x00005555555a7ce5 in _start ()

This particular test treats input data vector as encrypted SRTP packet without auth tag (just RTP headed and encypted data), adds calculated Auth Tag at the end and calls DecryptRTP from pion/srtp.

These random crashes which we see occurs in some fuzz tests only. I suspect that bug is somehow related to crypto things, namely AES-CM and SHA-1 (this particular fuzz test uses SRTP_AES128_CM_HMAC_SHA1_80 and SRTP_AES128_CM_HMAC_SHA1_32 profiles).

What did you see happen?

Crash, see above

What did you expect to see?

No crash

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 17, 2025
@prattmic
Copy link
Member

prattmic commented Apr 17, 2025

I suspect that this is duplicate of #73259.

Is the fault address 0x118? You can find the fault address as addr in the runtime panic output. e.g.,

SIGSEGV: segmentation violation
PC=0x468da4 m=13 sigcode=1 addr=0x118
goroutine 0 [idle]:
runtime.(*unwinder).next(0xfc510200e438)
...

If so, I'll dup this into the other issue.

I suspect that bug is somehow related to crypto things, namely AES-CM and SHA-1 (this particular fuzz test uses SRTP_AES128_CM_HMAC_SHA1_80 and SRTP_AES128_CM_HMAC_SHA1_32 profiles).

Thanks, this is a helpful hint.

Can you share a reproducer (even if failure rate is low)? Even if not, it sounds like you are in a better spot than #73259 since you can get it in gdb instead of just in prod.

@prattmic
Copy link
Member

Also, just to clarify, this is running on linux-arm64, right?

@prattmic prattmic added this to the Go1.25 milestone Apr 17, 2025
@prattmic prattmic added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 17, 2025
@prattmic
Copy link
Member

cc @golang/runtime

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Apr 17, 2025
@sirzooro
Copy link
Author

sirzooro commented Apr 17, 2025

I suspect that this is duplicate of #73259.

Is the fault address 0x118? You can find the fault address as addr in the runtime panic output. e.g.,

SIGSEGV: segmentation violation
PC=0x468da4 m=13 sigcode=1 addr=0x118
goroutine 0 [idle]:
runtime.(*unwinder).next(0xfc510200e438)
...

If so, I'll dup this into the other issue.

Unfortunately github.com/dvyukov/go-fuzz did not print go callstack for some reason (only for callstack printing code from libFuzzer), so I had to use gdb to get something more useful.

I suspect that bug is somehow related to crypto things, namely AES-CM and SHA-1 (this particular fuzz test uses SRTP_AES128_CM_HMAC_SHA1_80 and SRTP_AES128_CM_HMAC_SHA1_32 profiles).

Thanks, this is a helpful hint.

Can you share a reproducer (even if failure rate is low)? Even if not, it sounds like you are in a better spot than #73259 since you can get it in gdb instead of just in prod.

Here you are. You probably will have to run Fuzz with random input data in a loop for a few hours. Not sure if fuzz runner from github.com/dvyukov/go-fuzz is needed as well, or one from golang would be enough.

package fuzz

import (
	"crypto/aes"
	"crypto/hmac"
	"crypto/sha1"
	"encoding/binary"
	"github.com/pion/rtp"
	"github.com/pion/srtp/v3"
)

// function copied from pion code
func aesCmKeyDerivation(label byte, masterKey, masterSalt []byte, outLen int) ([]byte, error) {
	// https://tools.ietf.org/html/rfc3711#appendix-B.3
	// The input block for AES-CM is generated by exclusive-oring the master salt with the
	// concatenation of the encryption key label 0x00 with (index DIV kdr),
	// - index is 'rollover count' and DIV is 'divided by'

	nMasterKey := len(masterKey)
	nMasterSalt := len(masterSalt)

	prfIn := make([]byte, 16)
	copy(prfIn[:nMasterSalt], masterSalt)

	prfIn[7] ^= label

	// The resulting value is then AES encrypted using the master key to get the cipher key.
	block, err := aes.NewCipher(masterKey)
	if err != nil {
		return nil, err
	}

	out := make([]byte, ((outLen+nMasterKey)/nMasterKey)*nMasterKey)
	var i uint16
	for n := 0; n < outLen; n += block.BlockSize() {
		binary.BigEndian.PutUint16(prfIn[len(prfIn)-2:], i)
		block.Encrypt(out[n:n+nMasterKey], prfIn)
		i++
	}
	return out[:outLen], nil
}

// This is also mostly copied from pion
func createAuthTag(data []byte, masterKey []byte, masterSalt []byte, authTagLen int) ([]byte, error) {
	sessionAuthTag, err := aesCmKeyDerivation(0x01, masterKey, masterSalt, 20)
	if err != nil {
		return nil, err
	}
	sessionAuth := hmac.New(sha1.New, sessionAuthTag)

	if _, err := sessionAuth.Write(data); err != nil {
		return nil, err
	}

	rocRaw := []byte{0, 0, 0, 0}
	_, err = sessionAuth.Write(rocRaw)
	if err != nil {
		return nil, err
	}

	authTag := sessionAuth.Sum(nil)[0:authTagLen]

	out := make([]byte, len(data)+authTagLen)
	copy(out, data)
	copy(out[len(data):], authTag)
	return out, nil
}

func Fuzz(data []byte) int {
	if len(data) < 13 {
		return 1
	}

	key := make([]byte, 16)
	salt := make([]byte, 14)
	for n := 0; n < len(key); n++ {
		key[n] = 1
	}
	for n := 0; n < len(salt); n++ {
		salt[n] = 1
	}

	profile := srtp.ProtectionProfileAes128CmHmacSha1_80
	tagLen := 10
	if data[0]%2 == 1 {
		profile = srtp.ProtectionProfileAes128CmHmacSha1_32
		tagLen = 4
	}

	data = data[1:]
	ssrc := binary.BigEndian.Uint32(data[8:])

	data2, err := createAuthTag(data, key, salt, tagLen)
	if err != nil {
		return 0
	}

	ctx, err := srtp.CreateContext(key, salt, profile)
	ctx.SetROC(ssrc, 0)
	if err != nil {
		return 0
	}
	ctx.DecryptRTP(nil, data2, &rtp.Header{})
	return 1
}

Also, just to clarify, this is running on linux-arm64, right?

Our fuzz tests are executed on linux-amd64, my bad. But as I checked we had some internal bug reports from production code running on linux-arm64 with similar crashstacks, where we suspected some other faulty cgo code. Here is one crash from go 1.22.7. I also heard from others that 1.22.4 was affected too. For some reason now this occurs more often than in the past.

SIGSEGV: segmentation violation
PC=0x7fa8b23378 m=22 sigcode=18446744073709551610 addr=0x118

goroutine 0 gp=0x4004d82a80 m=22 mp=0x40022e4808 [idle]:
runtime.(*unwinder).next(0x7ee4be63e0)
	GOROOT/src/runtime/traceback.go:457 +0x188 fp=0x7ee4be6390 sp=0x7ee4be6300 pc=0x7fa8b23378
runtime.scanstack(0x4001e62fc0, 0x4000069c68)
	GOROOT/src/runtime/mgcmark.go:899 +0x23c fp=0x7ee4be64d0 sp=0x7ee4be6390 pc=0x7fa8adc81c
runtime.markroot.func1()
	GOROOT/src/runtime/mgcmark.go:241 +0xb4 fp=0x7ee4be6520 sp=0x7ee4be64d0 pc=0x7fa8adb344
runtime.markroot(0x4000069c68, 0x67, 0x1)
	GOROOT/src/runtime/mgcmark.go:215 +0x1cc fp=0x7ee4be65d0 sp=0x7ee4be6520 pc=0x7fa8adb00c
runtime.gcDrain(0x4000069c68, 0x7)
	GOROOT/src/runtime/mgcmark.go:1200 +0x434 fp=0x7ee4be6640 sp=0x7ee4be65d0 pc=0x7fa8add284
runtime.gcDrainMarkWorkerIdle(...)
	GOROOT/src/runtime/mgcmark.go:1114
runtime.gcBgMarkWorker.func2()
	GOROOT/src/runtime/mgc.go:1406 +0x74 fp=0x7ee4be6690 sp=0x7ee4be6640 pc=0x7fa8ad9204
runtime.systemstack(0x0)
	src/runtime/asm_arm64.s:243 +0x6c fp=0x7ee4be66a0 sp=0x7ee4be6690 pc=0x7fa8b30d6c

@prattmic prattmic removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

5 participants