perf!: use Cow to avoid OsString allocations by tilpner · Pull Request #487 · rustic-rs/rustic_core

tilpner · 2026-03-05T16:20:53Z

This PR eliminates many temporary allocations from unnecessary uses of OsString (they remain necessary for TreeIterator without lending iterators).

Previously, filename unescaping accounted for 43% of all total allocations, even in cases where there was nothing to unescape. This is avoided by checking for the presence of \ before, and using the original name if none was found.

comp_to_osstr was only responsible for 0.1% of allocations.

Results

The following measurements are not from a proper benchmarking setup, but merely repeated low-delta backups with heaptrack (without LTO and with raised codegen-units, on a laptop):

Total allocations reduced from 173M to 98M in my case.

These changes did not lower peak RSS significantly (4.7GiB to 4.6GiB, probably noise or slightly reduced fragmentation), which is expected for mostly temporary allocations.

Using the global allocator with glibc 2.40, there wasn't much (if any) walltime improvement (284s to 274s, measured without heaptrack), but it might be more significant on platforms with weaker allocators like musl.

aawsome · 2026-03-06T08:17:08Z

About the TreeIterator: I think this is one part which anyway needs some overwork: This is barely readable and too complex to understand and enhance. I wanted to implement features like allowing to use multiple different sources for a single snapshot (e.g. stdin and a local path) or to add additional sources (e.g. remote sources or using a tar as source) and almost always this complex "monster" was in the way and prevented a simple extension. (Ok, plus the parallelization used enforcing Send+Sync for many things). So IMO it's OK to keep the OsString here and tackle this when overworking the general tree handling for backups.

aawsome · 2026-03-06T08:19:26Z

And another thing: There is #390 which moves away from platform-dependent OsString/OsStr. But this anyway needs rebasing and some extra work; so I think we can first merge this quite small performance optimization!

aawsome

LGTM! Thanks a lot @tilpner

## 🤖 New release * `rustic_core`: 0.10.1 -> 0.11.0 (✓ API compatible changes) <details><summary>Changelog</summary> <blockquote> ## [0.11.0](rustic_core-v0.10.1...rustic_core-v0.11.0) - 2026-04-05 ### Added - Optimize hardlink creation in restore ([#495](#495)) - add exclude-if-xattr option ([#491](#491)) ### Fixed - make `ignore`'s `.git_exclude()` mirror `.git_ignore()` ([#494](#494)) ### Other - update dependencies ([#496](#496)) - preserve hardlinks on restore ([#492](#492)) - use general tree modifier in `repair snapshots` ([#463](#463)) - [**breaking**] Optimize file streaming ([#489](#489)) - [**breaking**] use Cow to avoid OsString allocations ([#487](#487)) </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: rustic-release-plz[bot] <182542030+rustic-release-plz[bot]@users.noreply.github.com>

Extends rustic-rs#12. PR rustic-rs#106 added `get_blob_cached` to amortize repeat fetches, but `commands::dump::dump` still walks its blob list serially: https://github.com/rustic-rs/rustic_core/blob/main/crates/core/src/commands/dump.rs#L39-L48 so on high-latency backends the throughput is bounded by per-blob round-trip time regardless of how many concurrent connections the backend itself permits. This routes the loop through `pariter::IteratorExt::parallel_map_scoped_custom`, the same family of ordered parallel-map primitives the archiver and packer already rely on, so fetches overlap N-wide while the writer stays single-threaded. Output ordering is preserved bit-for-bit; an integration test backs that up by chunking a 64 KiB payload into 16 fixed-size blobs and asserting that the parallel and sequential dumps both equal the source bytes exactly. The new `DumpOptions` carries a single `num_threads: u32` field (Setters, `non_exhaustive`, gated behind the existing `clap` feature), where `0` selects `available_parallelism` and `1` forces the sequential implementation. `Repository::dump_with_opts(node, w, &DumpOptions)` joins the existing `Repository::dump(node, w)`, which keeps its signature and now defaults to `available_parallelism`. Both methods pick up an `S: Sync` bound; the in-tree `IndexedFullStatus` already satisfies it, so all in-tree call sites compile unchanged. Files that decompose into a single blob take a sequential short-circuit and pay none of the worker setup cost. Out of scope: `dump --archive tar`, `--archive targz`, `mount`, and `webdav` go through `vfs::OpenFile::read_at` rather than `commands::dump::dump`. The serial blob-fetch shape is the same, but the consumer there is pull-based (the tar writer calls `Read::read` repeatedly with small buffers), so the right fix is read-ahead prefetching rather than the push-style overlap this PR uses. That wants its own change. ## Results Same caveat as rustic-rs#487: not a proper benchmark suite. An `InMemoryBackend` wrapper sleeps a configurable duration on every `read_full`/`read_partial` to simulate object-store round-trip latency. 4 MiB file, 64 KiB fixed-size chunks (64 blobs), fresh `Repository` per measurement so the in-process blob cache is cold for every run. ``` latency sequential parallel speedup 0ms 2124 MB/s 1497 MB/s 0.70x 1ms 54 MB/s 699 MB/s 12.89x 5ms 12 MB/s 180 MB/s 15.15x 20ms 3 MB/s 48 MB/s 15.65x 50ms 1 MB/s 19 MB/s 15.72x 100ms 0.6 MB/s 9 MB/s 15.81x ``` For any non-trivial backend latency the speedup converges to the worker count. A thread sweep at 20 ms latency confirms linear scaling: 1, 2, 4, 8, 16, 32 threads gives 1.0x, 2.0x, 4.0x, 7.9x, 15.8x, 30.8x. The 0 ms row is the cost. Against a free backend (warm page cache, in-memory backend, repeat dumps that already hit `get_blob_cached`) the parallel path runs at roughly 60-70% of sequential because thread setup and channel passing dominate when the per-blob fetch is itself near-instant. The single-blob short-circuit covers the smallest files, so the regression is bounded to multi-blob files on near-zero-latency backends, which is the case where dump throughput wasn't the bottleneck to begin with.

tilpner mentioned this pull request Mar 5, 2026

perf: use Cow Node names rustic-rs/rustic#1695

Closed

tilpner commented Mar 5, 2026

View reviewed changes

Comment thread crates/core/src/archiver/parent.rs

tilpner commented Mar 5, 2026

View reviewed changes

Comment thread crates/core/src/backend/node.rs

tilpner commented Mar 5, 2026

View reviewed changes

Comment thread crates/core/src/backend/node.rs

tilpner commented Mar 5, 2026

View reviewed changes

Comment thread crates/core/src/backend/node.rs

aawsome changed the title ~~perf: use Cow to avoid OsString allocations~~ perf!: use Cow to avoid OsString allocations Mar 6, 2026

aawsome reviewed Mar 6, 2026

View reviewed changes

Comment thread crates/core/src/backend/node.rs Outdated

aawsome approved these changes Mar 6, 2026

View reviewed changes

aawsome enabled auto-merge March 6, 2026 13:50

aawsome added this pull request to the merge queue Mar 6, 2026

tilpner added 2 commits March 6, 2026 14:57

perf!: use Cow Node names

07d0099

perf: use borrowed path components when possible

3dc1724

tilpner force-pushed the node-name-cow branch from 86daf73 to 3dc1724 Compare March 6, 2026 13:58

Merged via the queue into rustic-rs:main with commit 00b93d9 Mar 6, 2026

rustic-release-plz Bot mentioned this pull request Mar 6, 2026

chore(rustic_core): release v0.11.0 #488

Merged

ChecksumDev mentioned this pull request May 3, 2026

perf: parallelize blob fetch in dump command #501

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf!: use Cow to avoid OsString allocations#487

perf!: use Cow to avoid OsString allocations#487
aawsome merged 2 commits intorustic-rs:mainfrom
tilpner:node-name-cow

tilpner commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aawsome commented Mar 6, 2026

Uh oh!

aawsome commented Mar 6, 2026

Uh oh!

Uh oh!

aawsome left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tilpner commented Mar 5, 2026

Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aawsome commented Mar 6, 2026

Uh oh!

aawsome commented Mar 6, 2026

Uh oh!

Uh oh!

aawsome left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants