-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Proposal
Problem statement
It would be useful to have a version of io::copy
that can use:
- splice or sendfile, some of the existing optimizations have to be rolled back in rust#108283
- nonblocking IO
Motivation, use-cases
Solution sketches
An API specific to file descriptors
pub fn os::unix::os_copy(buffer: &mut BorrowedBuf, source: &impl AsFd, sink: &impl AsFd) -> io::Result<u64>
It is less generic than io::copy
but makes it explicit it only operates on file-like types and may need an intermediate buffer to hold data across multiple invocations when doing non-blocking IO.
Unclear: Whether it should return an error when offload isn't possible or silently fallback to io::copy.
Downsides:
- covering windows would be more complicated since it distinguishes between Handle and Socket
- requires
cfg()
s - can't be used by code generic over Read/Write types, e.g. tar-rs
Lean on specialization
pub fn io::zero_copy(source: &mut impl Read, sink: &mut impl Write) -> io::Result<u64>
This is essentially the same as today's io::copy
does but with altered guarantees
- if the caller passes a BufWriter then any read-but-unwritten bytes will be held in the bufwriter when a
WouldBlock
occurs. Otherwise the bytes will be dropped - changes made to
source
afterzero_copy
returns may become visible insink
, as is the case when usingsendfile
orsplice
Downsides:
- API guarantees strongly rely on specialization
- the non-blocking case might be a footgun if someone tries to pass a BufWriter as &dyn Write where the specialization won't be able to see it and thus end up dropping bytes
Hybrid of the above
Make the buffer an explicit argument for non-blocking IO but use best-effort specialization for the offloading aspects.
Encapsulate the copy operation in a struct/builder
Rough sketch:
struct Copier<'a, R, W> {
// a bunch of enums
}
impl<'a, R, W> Copier<'a, R, W> where R: Read, W: Write {
/// On errors an internal buffer will be allocated if none is provided
fn buffer(&mut self, buf: &'a mut BorrowedBuf) {}
fn source(&mut self, src: R) {}
fn sink(&mut self, sink: W) {}
/// Runs until first error, can be resumed with a later call
/// Does not ignore wouldblock or interrupted errors
fn copy_chunk() -> Result<u64> { todo!() }
fn total() -> u64 { todo!() }
}
#[cfg(unix)] // impl for brevity, should be an extension trait.
impl<'a, R, W> Copier<'a, R, W> where R: Read + AsFd {
fn fd_source(&mut self, src: &'a R) {}
}
#[cfg(unix)]
impl<'a, R, W> Copier<'a, R, W> where W: Write + AsFd {
fn fd_sink(&mut self, sink: &'a W) {}
}
Under the hood it could still try to use specialization if the platform-specific APIs aren't used.
Any of the above, but N pairs instead of 1
When copying many small files and the like it can be beneficial to run them in batches. It's not a full-fledged async runtime that could add work incrementally as other work items complete but still more efficient than doing one-at-a-time.
Under the hood we could use polling or io_uring where appropriate.
Links and related work
- specialize io::copy to use copy_file_range, splice or sendfile rust#75272
- don't splice from files into pipes in io::copy rust#108283
- https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/io.3A.3Acopy.20race.20.23108283/near/346623546
What happens now?
This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.
Activity
tisonkun commentedon Aug 2, 2023
Does
sendfile
support send part of a file? Kafka uses this syscall to implement its zero-copy record transferring:.. and I suppose one write a similar system with Rust should be able to achieve the same.
the8472 commentedon Aug 2, 2023
Yes, sendfile can send parts. But specifying seek positions doesn't make sense for all possible readers because some things aren't seekable (e.g. sockets and pipes). Specifying a length is possible but we can model that with
Take
on the writer side.I suppose for some uses it can make sense to reuse a file descriptor and use sendfile with explicit offsets so multiple streams can be served from the same fd. That'd be incompatible with the
Read
/Write
traits which assume they update the implicit seek position.tisonkun commentedon Aug 3, 2023
@the8472 I can live with an API like the above, i.e.,
file.copy(offset, len, socket)
. The currentio::copy
try to send full of the file so it isn't suitable for me.Yes. I can calculate the offset. But I don't find a function in std to accept an
offset
arg and delegate to sendfile if the platform supports it.NobodyXu commentedon Jul 28, 2024
The nightly API for anonymous pipe has been merged (tracking issue rust-lang/rust#127154)
It seems that we'd want some zero-copy API for pipe, since rust-lang/rust#108283 just rolled back some optimization for copying from file to pipe
std::io::copy
for non stdlib types #419