-
-
Notifications
You must be signed in to change notification settings - Fork 201
Do not load the entire artifact in memory when uploading (#618) #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This compiles, but is wholly untested. I might try to actually run it tomorrow on a fork or something. It would also be nice to be able to inject errors, somehow.... |
@konstin has all the context on this pattern in uv, I'll delegate review :) |
let result = request.send().await.map_err(|e| e.into()); | ||
|
||
if retryable_strategy.handle(&result) == Some(Retryable::Transient) { | ||
let retry_decision = retry_policy.should_retry(start_time, n_past_retries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw we have a more extensive retry policy in uv, but the reqwest-retry may be sufficient here.
} | ||
} | ||
break result?; | ||
}; | ||
|
||
if !response.status().is_success() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we don't retry on status errors, such as a 500 (only the 403 we handle in the github upload retry strategy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and the above) is intended to match the existing logic using reqwest-retry... we ought to retry on a 500, though, if we don't let me fix that while we're here.
src/github.rs
Outdated
// reqwest wants to take ownership of the body, so it's hard for us to do anything | ||
// clever with reading the file once and calculating the sha256sum while we read. | ||
// So we open and read the file again. | ||
let mut file = tokio::fs::File::open(local_filename).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Wrapping the file in a BufReader
gives better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it? My understanding was that BufReader
is helpful if you're doing lots of small reads, but here sha256 doesn't really have an opinion about the size of the buffer we pass in for each chunk, and I'm intentionally doing very large reads (1 MB). I think BufReader
defaults to tokion::io::util::DEFAULT_BUF_SIZE = 8192
bytes, and it looks like the implementation bypasses the internal buffer if you ask for a larger read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, with the manual large buffer this is actually not necessary.
dry_run, | ||
)); | ||
|
||
// reqwest wants to take ownership of the body, so it's hard for us to do anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's totally fine to read the file twice.
src/github.rs
Outdated
if len == 0 { | ||
break; | ||
}; | ||
hasher.update(&buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to slice the buf to actually read size? I don't know about the properties of SHA-512, but we're currently passing a variable number of trailing zero bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, yes. Not even zero bytes, whatever was at the end of the previous megabyte.
I feel like this is not the first time I have made this mistake in Rust, sigh. (This is also basically tedu's "heartbleed in Rust" from pre-1.0.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised the api doesn't have a way to pass a whole Read
or AsyncRead
, that's way more idomatic and prevents such mistakes, I find myself avoiding the &mut buf
APIs usually.
format!("{}.sha256", dest), | ||
Bytes::copy_from_slice(format!("{}\n", digest).as_bytes()), | ||
format!("{dest}.sha256"), | ||
UploadSource::Data(Bytes::copy_from_slice(format!("{digest}\n").as_bytes())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UploadSource::Data(Bytes::copy_from_slice(format!("{digest}\n").as_bytes())), | |
UploadSource::Data(Bytes::from(format!("{digest}\n"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was like that when I found it :) thanks
This lets me test the release scripts against a custom, fault-injected Python server, but I suppose it might also be useful for downstream users who have GHES, maybe. Patches welcome if anyone is using this and it doesn't quite work right!
Marking as draft since I'm working on a mock GitHub HTTP server to do some fault injection to test things properly and I want to do those tests, but I'm mildly confident about this code at the moment. |
No description provided.