-
Notifications
You must be signed in to change notification settings - Fork 118
Commit ebc2243
committed
Rewrite casync-http to use curl multi
This commit brings parallel downloads to casync-http, by using the curl
multi interface, and more precisely the "select()" flavour. For details
see <https://curl.haxx.se/libcurl/c/libcurl-multi.html>.
Libcurl has two ways to achieve parallel downloads:
- for HTTP/1, it can open parallel connections. The maximum number of
parallel connections is user-defined, through `MAX_HOST_CONNECTIONS`
and `MAX_TOTAL_CONNECTIONS`.
- for HTTP/2, it can attempt to multiplex on a single connection. The
maximum number of parallel downloads in this case is negociated
between the client and the server (we talk about number of streams in
the HTTP/2 jargon).
- (note that libcurl used to do pipelining over HTTP/1.1, but this is no
longer supported since 7.62.0, and casync-http doesn't use it anyway)
Accordingly, this commit also introduces two new command-line arguments
to better control the behavior of parallel downloads:
- `--max-active-chunks` is the sum of 1. the number of chunks in the
curl multi and 2. chunks downloaded and waiting to be sent to the
remote. It allows to limit the number of chunks waiting in RAM, in
case we download faster than we can send to remote. It also gives a
limit for the maximum number of concurrent downloads.
- `--max-host-connections` is for the case where libcurl opens parallel
connections to the server. In all likelihood it's only used for HTTP1.
We probably want a large number for max-active-chunks, to ensure we
don't starve the libcurl multi handle, but at the same time we probably
don't want to open too many connections in parallel, and that's why
max-host-connections is a much lower number. It seems to be a sensible
default, according to my understanding so far. User might want to adjust
these number for their specific use-case.
Note that the command-line argument `--rate-limit-bps` doesn't make much
sense anymore, since it's set for each chunk, but now chunks are
downloaded in parallel, and we don't really know how many downloads are
actually happening in parallel. And from Daniel Steinberg:
We don't have settings that limit the transfer speed of multiple,
combined, transfers.
So we might want to completely remove this option, or rework it somehow.
Note that this commit removes the wrapper `robust_curl_easy_perform()`
introduced in 328f13d. Quick reminder: this wrapper was used to sleep
and retry on `CURLE_COULDNT_CONNECT`, and allowed to workaround what
seemed to be a misbehavior of the Python Simple HTTP Server. Now that we
do parallel downloads, we can't apply this workaround "as is", we can't
just sleep. So I removed the wrapper. The issue is still present and
reproducible though, but I just assume it's a server issue, not a casync
issue.
Regarding unit tests
--------------------
This commit also opens interesting questions regarding the unit tests.
For now we're using the Python Simple HTTP Server, which can only serve
requests sequentially. It doesn't allow to really test parallelism. It's
not really representative of real-life scenario where, I assume, chunks
are served by a production server such as apache or nginx. Additionally,
I think it would be best to run the test for both a HTTP/1 server and a
HTTP/2 server.
One possibility is to use nginx, it's easy enough to run it. nginx can
serve HTTP/2 only if TLS is enabled though, and out of the box
casync-http will fail if it can't recognize the certificate. So we might
want to add a command-line argument to trust any random certificate.
Additionally, nginx requires root, maybe not very suitable for a test
suite, however there might be some workarounds?
Another possibility is to run nghttpd. This option is light on
dependencies, in the sense that libcurl already relies on libnhgttp2,
however I didn't succeed in using this server yet.
TODOs
-----
There are a few todos left, mainly:
- add argument or env variables support in casync-tool, to match new
arguments in casync-http.
- documentation, readme and such.
- some details in the code, see MR in GitHub.
Signed-off-by: Arnaud Rebillout <[email protected]>1 parent 095cb73 commit ebc2243Copy full SHA for ebc2243
File tree
Expand file treeCollapse file tree
2 files changed
+1128
-309
lines changedOpen diff view settings
Filter options
- src
Expand file treeCollapse file tree
2 files changed
+1128
-309
lines changedOpen diff view settings
Collapse file
+1-1Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
132 | | - | |
| 132 | + | |
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| |||
0 commit comments