Open
Description
I tried this code:
Command::new(command).
...
.stdin(Stdio::inherit())
In this case, I had already read a few bytes from stdin using stdin()
and read_exact
. strace showed that it read 8K from stdin, and that the command started missed the remainder of the 8K. #58326 discusses this in a roundabout way, but fundamentally it should be safe to read from stdin and then subsequently use it as input to a Command. The workaround was to use .stdin(Stdio::piped())
and copy the data across the pipe. Not ideal.
Meta
rustc --version --verbose
:
rustc 1.60.0 (7737e0b5c 2022-04-04)
binary: rustc
commit-hash: 7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c
commit-date: 2022-04-04
host: x86_64-unknown-linux-gnu
release: 1.60.0
LLVM version: 14.0.0
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
the8472 commentedon Jun 8, 2022
There's #78515 for switchable buffering for stdout. A similar API to switch to unbuffered IO could added for stdin.
jgoerzen commentedon Jun 8, 2022
That would provide a more elegant workaround for sure, but as a resolution it would be incomplete, since the default situation would still lead to data loss.
I would argue the default should be unbuffered, and a buffered stdin should be wrapped in BufReader by the user, that being a type that would be unsuitable for conversion to Stdio for a Command. That would go a long way towards preventing the user from doing something that is going to result in data loss.
the8472 commentedon Jun 8, 2022
The data isn't lost, it's in the buffer.
io::stdin
being buffered by default is intentional because it provides a better experience for most of the use-cases which don't involve passing the input to another process.And this is documented behavior
jgoerzen commentedon Jun 8, 2022
I understand that, but being in the buffer is tantamount to being lost, because the buffer is not present in the input to the child process. I can't think of any use case where I would want to:
stdin is just another fd on Unix; I find it very odd that it is treated so differently (behind a mutex and all that) in Rust. But it's true I don't have to care about that so long as it actually works properly -- which I don't believe it does in this case.
the8472 commentedon Jun 8, 2022
Currently you can access the underlying file directly this way:
The underlying syscalls are, sure. But most languages provide buffering and possibly thread-safety around stdio to avoid line-tearing and to speed up small reads/writes.
Even libc has buffering. https://man7.org/linux/man-pages/man3/setbuf.3.html
jgoerzen commentedon Jun 9, 2022
You are correct that this issue does exist in other languages in some cases; for instance, this from the popen(3) manpage on Linux:
However, there are several differences to note:
It may be too late for this wrt backwards compatibility, but if I were designing this, I wouldn't make stdin/out/err be special cases; BufReader and BufWriter could be used for them just like anything else if people want. But we are where we are.
So what to do about it?
Fundamentally people shouldn't have to resort to using strace, as I did, to discover that read_exact didn't do what it said on the tin.
I question about your ManuallyDrop use in the example above. I assume this is because when the File::from_raw_fd is dropped, it closes the underlying fd, which may not be desired. In that case, probably the explicit drop should occur after the spawn? (And I believe that if there was an error from read_exact, there would be a memory leak there, right?)
Perhaps an alternative would be to explicitly std::mem::drop the File after the spawn?