Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use zerocopy instead of raw transmute #392

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cgwalters
Copy link
Collaborator

This is a very old crate, and when I first started learning Rust, I thought a tar parser was exactly the kind of thing that should showcase Rust's claimed safety and speed.

I remember being surprised at the usage of unsafe here...thinking it undermined some of the Rust claims.

I didn't think about it too much more though, but I now understand that it wasn't exactly a language limitation (if we wanted to we could just have Header and functions which reinterpret fields manually) but nowadays the zerocopy crate has emerged as the go-to default for ergonomically deriving and proving at compile time the safety of these types of transmutations.

Let's start using it.

(Also use #[derive(Default)] instead of the unsafe mem::zeroed()
for `GnuExtSparseHeader)

Now I also went ahead and added a #[deny(unsafe_code)] - there's only a few calls to libc functions which could be replaced with rustix or nix...but that's a different topic.

Adding a new dependency

I am aware that this crate is widely used, including by e.g. cargo. I went ahead and proactively checked and it turns out that zerocopy is already a dependency of ahash, which is already a cargo dependency.

This is a very old crate, and when I first started learning
Rust, I thought a tar parser was exactly the kind of thing
that should showcase Rust's claimed safety and speed.

I remember being surprised at the usage of `unsafe` here...thinking
it undermined some of the Rust claims.

I didn't think about it too much more though, but I now understand
that it wasn't exactly a language limitation (if we wanted to
we could just have `Header` and functions which reinterpret
fields manually) but nowadays the zerocopy crate has emerged
as the go-to default for ergonomically deriving and proving
at compile time the safety of these types of transmutations.

Let's start using it.

(Also use `#[derive(Default)]` instead of the unsafe `mem::zeroed()`
 for `GnuExtSparseHeader)

Now I also went ahead and added a `#[deny(unsafe_code)]` - there's
only a few calls to libc functions which could be replaced
with rustix or nix...but that's a different topic.

Adding a new dependency
-----------------------

I am aware that this crate is widely used, including by e.g. `cargo`.
I went ahead and proactively checked and it turns out that zerocopy
is already a dependency of `ahash`, which is already a cargo dependency.

Signed-off-by: Colin Walters <[email protected]>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Dec 5, 2024
ref https://lwn.net/Articles/995814/ etc.

I also did a PR for tar-rs: alexcrichton/tar-rs#392

Thanks to the maintainer who helped me on Discord
to know the right API to call here.

Signed-off-by: Colin Walters <[email protected]>
allisonkarlitskaya pushed a commit to containers/composefs-rs that referenced this pull request Dec 5, 2024
ref https://lwn.net/Articles/995814/ etc.

I also did a PR for tar-rs: alexcrichton/tar-rs#392

Thanks to the maintainer who helped me on Discord
to know the right API to call here.

Signed-off-by: Colin Walters <[email protected]>
@cgwalters cgwalters requested a review from xzfc December 5, 2024 21:40
@xzfc
Copy link
Collaborator

xzfc commented Dec 6, 2024

This change would make zerocopy a public dependency of tar: impls would be visible in the docs, making it a part of the public API. Are we okay with this?

For the reference:

Related: #384 (comment)
Private zerocopy trait implementations are not supported (yet): google/zerocopy#1855

@cgwalters
Copy link
Collaborator Author

Thanks! I didn't read Alex's comment there carefully enough originally. Hmmm, yes this issue definitely rains on the parade.

I don't have a really strong opinion here honestly. What would be important data here is - how many crates expose types from this crate in their public API? My intuition (backed up by a quick skim of https://lib.rs/crates/tar/rev ) says that most crates are just using tar as an implementation detail. A counterexample is a crate I also maintain: https://github.com/containers/ocidir-rs - but we actually discussed severing this recently too.

I guess more generally what would the blast radius be from us doing semver bumps? We've claimed semver 0.4 compat since 4b095c8 - more than 8 years.

OK basically: out of conservatism I think my vote is we just keep this PR open until zerocopy handles private trait impls? It's not like there's any urgency here. Moving to draft.

@cgwalters cgwalters marked this pull request as draft December 6, 2024 00:29
@cgwalters
Copy link
Collaborator Author

The counter here though is: maybe there are some API cleanups we could choose to make as zerocopy does semver bumps? IOW yes we'd do semver bumps, but make them also useful for other reasons.

@xzfc
Copy link
Collaborator

xzfc commented Dec 6, 2024

As for libc calls. For the context, I've used nix in the initial implementation of #375, but Alex asked to replace it with libc:

For depending on nix, I'd personally prefer not to. How difficult would it be to use libc and syscalls there?

Maybe he has a different opinion on rustix which is currently already pulled through an optional dependency xattr since tar 0.4.41 (#348).

$ cargo tree --edges no-dev
tar v0.4.43
├── filetime v0.2.8
│   ├── cfg-if v0.1.6
│   └── libc v0.2.150
├── libc v0.2.150
└── xattr v1.1.3
    ├── linux-raw-sys v0.4.11
    └── rustix v0.38.28            <--
        ├── bitflags v2.4.0
        └── linux-raw-sys v0.4.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants