Skip to content

Stack overflow for big types in mpsc channels on separate thread #102246

Closed
@janie177

Description

@janie177

I tried this code:

use std::sync::mpsc::*;

struct BigStruct {
    _data: [u32; 32000],
}

fn main() {

let thread = std::thread::spawn(move || {
    let (_sender, receiver) = channel::<BigStruct>();
    
    for _data in receiver.try_iter() {

    }
    
});

    thread.join().unwrap();
    println!("Done :)");
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7ebdbceac5477ebb48a65f5ecacb5c62

I expected to see this happen:
The program shouldn't do much beside printing a friendly message.

Instead, this happened:
When running in Debug mode, the program crashes due to a stack overflow. In release mode it behaves as expected. Looping over the data in the receiver in combination with being on a separate thread triggers it.
I tried running on nightly too, which produces the same results.
I've now switched to crossbeam as a workaround, which does not suffer from the same issue fortunately.

rustc --version --verbose:

rustc 1.64.0 (a55dd71d5 2022-09-19)
binary: rustc
commit-hash: a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52
commit-date: 2022-09-19
host: x86_64-pc-windows-msvc
release: 1.64.0
LLVM version: 14.0.6
Backtrace

thread '<unknown>' has overflowed its stack

Activity

lapointexavier

lapointexavier commented on Aug 1, 2023

@lapointexavier

I've run into this when upgrading from 1.70.0 to 1.71.0. Compiling in release mode prevented the segfault when using 1.71.0, though the issue was not happening in 1.70.0.

ChrisDenton

ChrisDenton commented on Aug 11, 2023

@ChrisDenton
Member

I've not been able to reproduce this using the program in the OP. I've tried on both 1.71.1 and 1.73.0-nightly. In both cases it prints "Done :)".

rustc 1.73.0-nightly (439d066bc 2023-08-10)
binary: rustc
commit-hash: 439d066bcf9496b1b8c8dde8bef3bce607a621bb
commit-date: 2023-08-10
host: x86_64-pc-windows-msvc
release: 1.73.0-nightly
LLVM version: 17.0.0
lapointexavier

lapointexavier commented on Aug 11, 2023

@lapointexavier

@ChrisDenton OP's didn't trigger it for me either, but after playing with it I was able to reproduce:

Repro

use std::sync::mpsc::*;

struct BigStruct {
    _data: [u8; 32768],
}

fn main() {
    let (sender, receiver) = channel::<BigStruct>();

    let thread1 = std::thread::spawn(move || {
        let data = BigStruct { _data: [0u8; 32768] };
        sender.send(data).unwrap();
    });

    thread1.join().unwrap();
    for _data in receiver.try_iter() {

    }
    println!("Done :)");
}

Output

~/Dev/hello_world$ cargo run --release
   Compiling hello_world v0.1.0 (/.../hello_world)
    Finished release [optimized] target(s) in 0.56s
     Running `target/release/hello_world`
Done :)

~/Dev/hello_world$ cargo run
   Compiling hello_world v0.1.0 (/.../hello_world)
    Finished dev [unoptimized + debuginfo] target(s) in 0.31s
     Running `target/debug/hello_world`

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
rustc --version --verbose
rustc 1.71.1 (eb26296b5 2023-08-03)
binary: rustc
commit-hash: eb26296b556cef10fb713a38f3d16b9886080f26
commit-date: 2023-08-03
host: x86_64-unknown-linux-gnu
release: 1.71.1
LLVM version: 16.0.5
lapointexavier

lapointexavier commented on Aug 11, 2023

@lapointexavier

Also this may be system dependent, but I was playing with slice sizes and the trigger seems to be 8120 + 1 for the overflow. I don't know why release mode doesn't overflow, maybe something is being optimized.

added
T-libsRelevant to the library team, which will review and decide on the PR/issue.
and removed on Aug 12, 2023
ChrisDenton

ChrisDenton commented on Aug 12, 2023

@ChrisDenton
Member

Ok, I've not yet investigated but my guess would be something about how it's not being optimized in start_send is multiplying the needed stack space by a fair amount.

@ibraheemdev might have some ideas. Note though this only affects opt-level=0 builds.

added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Aug 12, 2023
Enselic

Enselic commented on Nov 5, 2024

@Enselic
Member

The reproduer can be simplifed to:

// We must keep receiver alive
#[allow(unused_variables)]

fn main() {
    let (sender, receiver) = std::sync::mpsc::channel::<[u8; 32768]>();

    let thread = std::thread::spawn(move || {
        sender.send([0u8; 32768]).unwrap();
    });

    thread.join().unwrap();

    println!("Done :)");
}

Regressed versions

The code runs fine in debug builds with 1.66 but not with 1.67. So this is a regression-from-stable-to-stable, but arguably not very severe since this seemingly has gone mostly unnoticed for a considerable amount of time.

With cargo bisect-rustc we find that this runs fine with nightly-2022-11-13 but not nightly-2022-11-14 which leaves us with

git log --oneline --no-merges \
    ^6284998a2677d7e3e8420db783f3aa4fd80d7423 \
    e631891f7ad40eac3ef58ec3c2b57ecd81e40615 

of which #93563 seems most likely.

To demonstrate:

for toolchain in 1.66 \
                 1.67 \
                 nightly-2022-11-13 \
                 nightly-2022-11-14 ; do
    rm -rf target
    echo "\nUsing $toolchain"
    cargo +$toolchain run --quiet
done
Using 1.66
Done :)

Using 1.67
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow

Using nightly-2022-11-13
Done :)

Using nightly-2022-11-14
thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow

Disassembly

The disassembly of std::sync::mpmc::list::Channel<[u8; 32768]>::start_send<[u8; 32768]> where we SIGSEGV in gdb looks like this:

||   0x55555555df70 <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE>       mov    %rsp,%r11        
││   0x55555555df73 <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE+3>     sub    $0x1f0000,%r11   
││   0x55555555df7a <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE+10>    sub    $0x1000,%rsp     
││  >0x55555555df81 <_ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h8dac1358fea503eaE+17>    movq   $0x0,(%rsp)      

So we SIGSEGV when 0 is being written to the top of the stack (probably due to stack probing) after growing it ~0x1f0000 bytes which is ~2MB.

In the LLVM IR we can find:

  %_34 = alloca [1016064 x i8], align 8
  %_31 = alloca [1016064 x i8], align 8

I looked at the MIR too but couldn't find anything suspicious. Maybe someone used to debuggging monomorphization/codegen can easily find out why the stack grows by 2MB.

My system

$ cargo +1.67 rustc -- -vV
rustc 1.67.1 (d5a82bbd2 2023-02-07)
binary: rustc
commit-hash: d5a82bbd26e1ad8b7401f6a718a9c57c96905483
commit-date: 2023-02-07
host: x86_64-unknown-linux-gnu
release: 1.67.1
LLVM version: 15.0.6
$ uname -a
Linux martins-dator 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Nov 5, 2024
added and removed
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Nov 6, 2024

17 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.P-mediumMedium priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-libsRelevant to the library team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @cuviper@Enselic@lapointexavier@Amanieu@janie177

      Issue actions

        Stack overflow for big types in mpsc channels on separate thread · Issue #102246 · rust-lang/rust