Skip to content

What are the soundness requirements for dlopen? #525

Open
@RalfJung

Description

@RalfJung
Member

Using dlopen is a subtle art. On top of the usual requirements around symbol conflicts and ABI compatibility, Rust's handling of symbols adds certain extra assumptions that can lead to UB here: ideally, we'd make sure that symbols from "different" crates can never clash. During normal builds, this is ensured by checking that the StableCrateId is globally unique (and hashing everything into the StableCrateId that is considered as relevant for crate identity), but this check is bypassed by dlopen.

At the very least, this potential risk of collisions in dlopen seems worth documenting somewhere. On top of that, is there anything we could do to mitigate this problem? Making StableCrateId an actual cryptographic hash and 256 bits large is probably going to be prohibitively expensive, but maybe there is an alternative where only dlopen users have to pay for extra checks, and if you don't use dlopen it doesn't cost anything. One could imagine a rust_checked_dlopen or so that performs the crate ID uniqueness check at runtime, somehow. Is that realistic? Is it useful?

Activity

VorpalBlade

VorpalBlade commented on Aug 13, 2024

@VorpalBlade

What exactly are we trying to protect against?

Let me play devil's advocate here:

Due to lack of stable ABI you will most probably be using C ABI anyway, and no name mangling. You might be using stabby or similar (which builds on top of the C ABI), but arguably they are off doing their own thing.

dlsym is very basic and even in C++ that has a stable ABI it doesn't work well with C++ name mangling, you are generally working with extern C functions across dlopen/dlsym. You might have an extern C function that returns a more complex object full of C++ types and pointers but that can have a lot of footguns (same version of all involved types must be used and since there is no name mangling you can't detect this anyway).

So, assuming extern C API what can we even protect against? C ABI is fundamentally not safe due to lack of name mangling.

Is that not the usage scenario then what is? Both possible alternatives (stabby and abi_stable) already solve the safety concerns at a higher level. Is the list of things that such a layer needs to deal with what you want to end up with in this issue?

It would probably help to come up with some use cases, describing what could go wrong in order to figure this out. As it is, this issue seems broad and vague, or perhaps I'm misunderstanding it.

RalfJung

RalfJung commented on Aug 13, 2024

@RalfJung
MemberAuthor

I'm not sure what the usual usecases here are.^^ If people only ever dlopen things that have a C ABI and nothing else (no Rust symbols exported), then indeed collisions in Rust's name mangling are entirely irrelevant. But is that really the only thing people do?

Cc @bjorn3

bjorn3

bjorn3 commented on Aug 13, 2024

@bjorn3
Member

Rustc dlopens codegen backends and uses the rust abi for this. The fact that rust has an unstable abi doesn't matter when you ensure that you use the same rustc version to compile the host and the plugin. For codegen backends using something like stabby or abi_stable is impractical as codegen backends are expected to use the exact same api's as rustc uses internally. Conversion of values at the abi boundary would result in unacceptable overhead.

VorpalBlade

VorpalBlade commented on Aug 13, 2024

@VorpalBlade

Rustc dlopens codegen backends and uses the rust abi for this. The fact that rust has an unstable abi doesn't matter when you ensure that you use the same rustc version to compile the host and the plugin.

How does this deal with name mangling and dlsym though? Does it still use no_mangle or does it compute the expected mangled names and pass those to dlsym?

bjorn3

bjorn3 commented on Aug 13, 2024

@bjorn3
Member

For the functions in the plugin to be called by the host #[no_mangle] is used. For functions that the plugin calls, those are defined in a dylib which both the plugin and host use as regular rust dependency, ensuring that rustc correctly handles symbol mangling.

VorpalBlade

VorpalBlade commented on Aug 13, 2024

@VorpalBlade

What bjorn3 said makes sense to me, when doing dlopen you need to use no_mangle. And ld.so takes care of resolving symbols called by the plugin. I guess there could be some possible issues there? As I understand it this is:

  • Program A depends on dylib B (normal load time dependency)
  • Program A then dlopens dylib C
  • Dylib C also depends on dylib B (normal load time dependency).
  • B exports normal mangled API
  • C exports a no_mangle API

What are the ways this can fail in?

Additional note: The Windows/Mac equivalents to dlopen may also have special consideration. I know that symbol resolution works differently for those (not a single global namespace) but I'm not an expert by any means, especially on those platforms.

Not sure how any of this could affect the opsem angle, and if people who don't care about portability will want to make use of the semantics of their platform of choice. I'm not entirely sure what the opsem angle on this even is, how does the AM represent dlopen/LoadLibrary even?

RalfJung

RalfJung commented on Aug 13, 2024

@RalfJung
MemberAuthor

The concern is if dylib C depends on crate E, but E happens to have the same StableCrateId as B. Then the symbols of the two crates will get mixed up and everything explodes, even though it doesn't look like the dlopen is doing anything wrong.

VorpalBlade

VorpalBlade commented on Aug 13, 2024

@VorpalBlade

@RalfJung from a pragmatic point of view two questions come to mind:

  • What is the probability of collision (assuming random chance, not nefarious actors)?
  • What is the threat model here? It realistically can't assume nefarious actors (if you load a malicious native library it is game over already, there is arbitrary code execution at the very moment of loading: IFUNCs (of xz fame), static constructors etc).
chorman0773

chorman0773 commented on Aug 13, 2024

@chorman0773
Contributor

I don't know that we can really express these soundness requirements in any tangible manner. It's like saying that you must not use #[export_name] to collide with a symbol name (especially since the language doesn't even guarantee the form of those symbols.

RalfJung

RalfJung commented on Aug 13, 2024

@RalfJung
MemberAuthor

from a pragmatic point of view two questions come to mind:

StableCrateId is a 32bit hash (AFAIK), so (according to this) with 2900 crates in the overall dependency tree there is a 0.1% chance of collision if the hashes are assumed to be fully random.

See rust-lang/rust#10389 and rust-lang/rust#129030 for more of these discussions. In this issue, I am interested in exploring what could be done to fix this, not in discussing threat models. (This doesn't mean I think we must fix this, I just want to know what the options would be.)

bjorn3

bjorn3 commented on Aug 13, 2024

@bjorn3
Member

For symbol name collisions I believe you have to collide both the StableCrateId and crate name (assuming the v0 symbol mangling scheme). You are unlikely to use 2900 crates with the same name. If you have 3 crates with the same name the chance of a collision is only 10^-9 and at 93 crates with the same name you get a collision chance of 10^-6.

digama0

digama0 commented on Aug 13, 2024

@digama0

Also re: "what's the opsem angle", I think the title question is clear enough: What are the requirements for a programmer to be able to call dlopen without causing UB? This requires understanding (1) what are the bad situations that can arise that we would like to classify as UB, and (2) what are the things that the programmer or user did that can lead to the bad situation, which is what we want to put on the warning label.

I think a threat model only comes up when it comes to prioritizing the safety requirements in (2) for human consumption, but abstractly it should be possible to come up with an objective answer to the question.

My knowledge of dynamic linking protocols is pretty low so I can't answer the question itself, though. Brainstorming some things based on what has been brought up:

  • One case of a (1) bad situation is that you dlopen a function with a name that already exists, I think this has to be UB. It's not clear to me whether we need to make an exception for "benign redeclarations".
  • There are many potential (2) causes for name collisions though.
    • Hash collisions are technically possible but presumably unlikely. It is unclear whether we want to blame the programmer or the compiler for this though, it's not really a user facing issue.
    • #[no_mangle] collisions are clearly on the programmer, that's why it's unsafe, but there are reasons it might not be obvious or it may be a distributed responsibility bug. I'd really wish we could catch these issues with a nice error message.
    • Is there something else I'm missing? Collisions between a #[no_mangle] local function and an internally defined C function in the dlopen'd library?
VorpalBlade

VorpalBlade commented on Aug 13, 2024

@VorpalBlade

Thank you, we now have a concrete issue that can lead to unsoundness, which is much easier to dicuss than the general issues with dlopen (which obviously have more, such as general no_mangle collisions etc).

Some thoughts:

One thing that comes to mind is that dlopen has flags that affect name resolution of later dlopen as well. In particular RTLD_GLOBAL and RTLD_DEEPBIND might have interesting interactions. In any case the flags mean there isn't just one single behaviour.

Does eager binding (RTLD_NOW and the corresponding ld flags) help at all? I think the newly loaded library will still get messed up in case of a collision, but existing code will be unaffected. So not good enough.

On glibc it looks like dlmopen would actually avoid the issue you describe though by putting everything into a separate namespace! Not portable though. Also only 16 separate namespaces are supported apparently. So not great.

VorpalBlade

VorpalBlade commented on Aug 13, 2024

@VorpalBlade

13 remaining items

michaelwoerister

michaelwoerister commented on Aug 14, 2024

@michaelwoerister
Member

from a pragmatic point of view two questions come to mind:

StableCrateId is a 32bit hash (AFAIK), so (according to this) with 2900 crates in the overall dependency tree there is a 0.1% chance of collision if the hashes are assumed to be fully random.

To clarify: StableCrateId is a 64-bit hash.

michaelwoerister

michaelwoerister commented on Aug 14, 2024

@michaelwoerister
Member

Given 64-bit hashes, one would get a collision probability of 10-15 when having 190 versions of the same crate in the crate graph (according to the table here).

RalfJung

RalfJung commented on Aug 14, 2024

@RalfJung
MemberAuthor

Turns out there are actually two levels of hashes here, with a theoretical chance of collision in each level: cargo hashes some stuff into -Cmetadata, and then rustc hashes that together with other stuff into StableCrateId.

I don't think that changes anything about the probabilities, but it seems to make it harder to actually check for collisions, since we'd need the original data cargo hashes together to ensure it is all globally unique.

michaelwoerister

michaelwoerister commented on Aug 14, 2024

@michaelwoerister
Member

Cargo could probably switch to using 256 bit cryptographic hashes to reduce the concern here.

RalfJung

RalfJung commented on Aug 14, 2024

@RalfJung
MemberAuthor

For symbol name collisions I believe you have to collide both the StableCrateId and crate name (assuming the v0 symbol mangling scheme)

The "I believe" is giving me pause, would be good to have that checked. :)
So the crate name goes into StableCrateId but is then also separately repeated in the symbol name?

Also, v0 symbol mangling is still not stable...

bjorn3

bjorn3 commented on Aug 14, 2024

@bjorn3
Member

So the crate name goes into StableCrateId but is then also separately repeated in the symbol name?

Yes. The demangled version of a crate reference in the v0 symbol mangling scheme is crate_name[crate_disambiguator] where the crate disambiguator happens to be the StableCrateId in the current rustc version, though any byte sequence unique between different crates is allowed by the specification.

Also, v0 symbol mangling is still not stable...

It is stable, but not the default. You can enable it using -Csymbol-mangling-version=v0.

michaelwoerister

michaelwoerister commented on Aug 15, 2024

@michaelwoerister
Member

We could probably harden v0 symbol names against collisions by adding a single (but wide) hash to each symbol that includes more information about the all crate-ids occurring in the name. So instead of

my_crate[{64-bit hash}]::foo<my_other_crate[{64-bit hash}]::Bar>

we could have something like

my_crate[0]::foo<my_other_crate[0]::Bar> @ {256 bit hash}

where @ {256 bit hash} is some kind of suffix on the symbol name that includes the full information that would have gone into computing the StableCrateId of both my_crate and my_other_crate. Assuming that we trust a 256 bit cryptographic hash to have truly negligible collision risk, that might solve the problem (but would not be cheap in terms of symbol name length as encoding the hash would take 42 bytes).

Some other kind of means that does not rely on hashing at all might be preferable. I don't know if something like rust_checked_dlopen is feasible.

bjorn3

bjorn3 commented on Aug 15, 2024

@bjorn3
Member

That would actually break the existing collision detection for the non-dlopen case. It also means if you have two versions of the same crate, you can no longer know which crate was used after demangling without parsing the crate metadata of all crates and trying every combination of StableCrateId to see if it matches the given combined hash.

michaelwoerister

michaelwoerister commented on Aug 15, 2024

@michaelwoerister
Member

That would actually break the existing collision detection for the non-dlopen case.

In what way?

It also means if you have two versions of the same crate, you can no longer know which crate was used after demangling without parsing the crate metadata of all crates and trying every combination of StableCrateId to see if it matches the given combined hash.

Yes, that's true. But that information is already pretty opaque, right?

bjorn3

bjorn3 commented on Aug 15, 2024

@bjorn3
Member

That would actually break the existing collision detection for the non-dlopen case.

In what way?

Rustc is guaranteed to error when StableCrateId collide if this collision occurs when both crates with the same StableCrateId are used as (indirect) dependencies by a crate. When you the StableCrateId in the symbol names together, this may result in a collision even if no StableCrateId collides.

Yes, that's true. But that information is already pretty opaque, right?

In the case of two my_crate[{64-bit hash}]::foo<my_other_crate[{64-bit hash}]::Bar> with distinct hashes, you can distinguish my_crate being a different version from my_other_crate being a different version. As well as check if my_crate[{64-bit hash}]::foo and my_crate[{64-bit hash}]::bar are from the same crate. And finally you can use rustc -Zls=root to read the disambiguator from the rlib/rmeta file of a crate.

RalfJung

RalfJung commented on Aug 15, 2024

@RalfJung
MemberAuthor

Rustc is guaranteed to error when StableCrateId collide if this collision occurs when both crates with the same StableCrateId are used as (indirect) dependencies by a crate. When you the StableCrateId in the symbol names together, this may result in a collision even if no StableCrateId collides.

If you do that with a cryptographic hash, the chances of a collision are astronomically low so we may disregard that possibility. Like, there's not a single known example of a collision for SHA256 or other comparable hashes.

michaelwoerister

michaelwoerister commented on Aug 15, 2024

@michaelwoerister
Member

When you the StableCrateId in the symbol names together, this may result in a collision even if no StableCrateId collides.

Actually, the proposal would be something like:

  • For every crate losslessly store the full information that goes into computing StableCrateId in crate metadata, so it is accessible downstream.
  • When generating a symbol name that references a given crate, feed that full information into the cryptographic hash.

There would only be one level of hashing (disregarding that Cargo feeds hashes into -Cmetadata) and that would be a cryptographic hash. So I think the risk of collisions should be very low.

In the case of two my_crate[{64-bit hash}]::foo<my_other_crate[{64-bit hash}]::Bar> with distinct hashes, you can distinguish my_crate being a different version from my_other_crate being a different version. [...]

This could be mitigated by using short prefixes of the StableCrateId like git does for commit hashes. But it's not a perfect solution as it might lead to "phantom collisions" that don't affect correctness but are misleading to humans.

chorman0773

chorman0773 commented on Aug 15, 2024

@chorman0773
Contributor

I think that dylibs are just a pain in the ass in general for opsem - so are rlibs to an extent, because you can "guess" (or determine) the mangled name of a symbol and deliberately produce a function with that export_name.

rustc could probably fix at least some here by using STV_PROTECTED. I don't know what opsem can do here. DSOs are a mess everywhere.

bjorn3

bjorn3 commented on Aug 15, 2024

@bjorn3
Member

rustc could probably fix at least some here by using STV_PROTECTED. I don't know what opsem can do here. DSOs are a mess everywhere.

I agree we should use protected visibility for non-#[no_mangle] symbols. We already have rust-lang/rust#105518 open as issue for that. I tried to implement it once, but got linker errors on arm64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @dlight@RalfJung@Diggsey@digama0@michaelwoerister

        Issue actions

          What are the soundness requirements for `dlopen`? · Issue #525 · rust-lang/unsafe-code-guidelines