Description
Using dlopen
is a subtle art. On top of the usual requirements around symbol conflicts and ABI compatibility, Rust's handling of symbols adds certain extra assumptions that can lead to UB here: ideally, we'd make sure that symbols from "different" crates can never clash. During normal builds, this is ensured by checking that the StableCrateId
is globally unique (and hashing everything into the StableCrateId
that is considered as relevant for crate identity), but this check is bypassed by dlopen
.
At the very least, this potential risk of collisions in dlopen
seems worth documenting somewhere. On top of that, is there anything we could do to mitigate this problem? Making StableCrateId
an actual cryptographic hash and 256 bits large is probably going to be prohibitively expensive, but maybe there is an alternative where only dlopen
users have to pay for extra checks, and if you don't use dlopen
it doesn't cost anything. One could imagine a rust_checked_dlopen
or so that performs the crate ID uniqueness check at runtime, somehow. Is that realistic? Is it useful?
Activity
VorpalBlade commentedon Aug 13, 2024
What exactly are we trying to protect against?
Let me play devil's advocate here:
Due to lack of stable ABI you will most probably be using C ABI anyway, and no name mangling. You might be using stabby or similar (which builds on top of the C ABI), but arguably they are off doing their own thing.
dlsym
is very basic and even in C++ that has a stable ABI it doesn't work well with C++ name mangling, you are generally working with extern C functions across dlopen/dlsym. You might have an extern C function that returns a more complex object full of C++ types and pointers but that can have a lot of footguns (same version of all involved types must be used and since there is no name mangling you can't detect this anyway).So, assuming extern C API what can we even protect against? C ABI is fundamentally not safe due to lack of name mangling.
Is that not the usage scenario then what is? Both possible alternatives (stabby and abi_stable) already solve the safety concerns at a higher level. Is the list of things that such a layer needs to deal with what you want to end up with in this issue?
It would probably help to come up with some use cases, describing what could go wrong in order to figure this out. As it is, this issue seems broad and vague, or perhaps I'm misunderstanding it.
RalfJung commentedon Aug 13, 2024
I'm not sure what the usual usecases here are.^^ If people only ever
dlopen
things that have a C ABI and nothing else (no Rust symbols exported), then indeed collisions in Rust's name mangling are entirely irrelevant. But is that really the only thing people do?Cc @bjorn3
bjorn3 commentedon Aug 13, 2024
Rustc dlopens codegen backends and uses the rust abi for this. The fact that rust has an unstable abi doesn't matter when you ensure that you use the same rustc version to compile the host and the plugin. For codegen backends using something like stabby or abi_stable is impractical as codegen backends are expected to use the exact same api's as rustc uses internally. Conversion of values at the abi boundary would result in unacceptable overhead.
VorpalBlade commentedon Aug 13, 2024
How does this deal with name mangling and dlsym though? Does it still use no_mangle or does it compute the expected mangled names and pass those to dlsym?
bjorn3 commentedon Aug 13, 2024
For the functions in the plugin to be called by the host
#[no_mangle]
is used. For functions that the plugin calls, those are defined in a dylib which both the plugin and host use as regular rust dependency, ensuring that rustc correctly handles symbol mangling.VorpalBlade commentedon Aug 13, 2024
What bjorn3 said makes sense to me, when doing dlopen you need to use no_mangle. And
ld.so
takes care of resolving symbols called by the plugin. I guess there could be some possible issues there? As I understand it this is:What are the ways this can fail in?
Additional note: The Windows/Mac equivalents to dlopen may also have special consideration. I know that symbol resolution works differently for those (not a single global namespace) but I'm not an expert by any means, especially on those platforms.
Not sure how any of this could affect the opsem angle, and if people who don't care about portability will want to make use of the semantics of their platform of choice. I'm not entirely sure what the opsem angle on this even is, how does the AM represent dlopen/LoadLibrary even?
RalfJung commentedon Aug 13, 2024
The concern is if dylib C depends on crate E, but E happens to have the same StableCrateId as B. Then the symbols of the two crates will get mixed up and everything explodes, even though it doesn't look like the
dlopen
is doing anything wrong.VorpalBlade commentedon Aug 13, 2024
@RalfJung from a pragmatic point of view two questions come to mind:
chorman0773 commentedon Aug 13, 2024
I don't know that we can really express these soundness requirements in any tangible manner. It's like saying that you must not use
#[export_name]
to collide with a symbol name (especially since the language doesn't even guarantee the form of those symbols.RalfJung commentedon Aug 13, 2024
StableCrateId
is a 32bit hash (AFAIK), so (according to this) with 2900 crates in the overall dependency tree there is a 0.1% chance of collision if the hashes are assumed to be fully random.See rust-lang/rust#10389 and rust-lang/rust#129030 for more of these discussions. In this issue, I am interested in exploring what could be done to fix this, not in discussing threat models. (This doesn't mean I think we must fix this, I just want to know what the options would be.)
bjorn3 commentedon Aug 13, 2024
For symbol name collisions I believe you have to collide both the
StableCrateId
and crate name (assuming the v0 symbol mangling scheme). You are unlikely to use 2900 crates with the same name. If you have 3 crates with the same name the chance of a collision is only 10^-9 and at 93 crates with the same name you get a collision chance of 10^-6.digama0 commentedon Aug 13, 2024
Also re: "what's the opsem angle", I think the title question is clear enough: What are the requirements for a programmer to be able to call
dlopen
without causing UB? This requires understanding (1) what are the bad situations that can arise that we would like to classify as UB, and (2) what are the things that the programmer or user did that can lead to the bad situation, which is what we want to put on the warning label.I think a threat model only comes up when it comes to prioritizing the safety requirements in (2) for human consumption, but abstractly it should be possible to come up with an objective answer to the question.
My knowledge of dynamic linking protocols is pretty low so I can't answer the question itself, though. Brainstorming some things based on what has been brought up:
#[no_mangle]
collisions are clearly on the programmer, that's why it's unsafe, but there are reasons it might not be obvious or it may be a distributed responsibility bug. I'd really wish we could catch these issues with a nice error message.#[no_mangle]
local function and an internally defined C function in the dlopen'd library?VorpalBlade commentedon Aug 13, 2024
Thank you, we now have a concrete issue that can lead to unsoundness, which is much easier to dicuss than the general issues with dlopen (which obviously have more, such as general no_mangle collisions etc).
Some thoughts:
One thing that comes to mind is that dlopen has flags that affect name resolution of later dlopen as well. In particular
RTLD_GLOBAL
andRTLD_DEEPBIND
might have interesting interactions. In any case the flags mean there isn't just one single behaviour.Does eager binding (RTLD_NOW and the corresponding ld flags) help at all? I think the newly loaded library will still get messed up in case of a collision, but existing code will be unaffected. So not good enough.
On glibc it looks like
dlmopen
would actually avoid the issue you describe though by putting everything into a separate namespace! Not portable though. Also only 16 separate namespaces are supported apparently. So not great.VorpalBlade commentedon Aug 13, 2024
13 remaining items
michaelwoerister commentedon Aug 14, 2024
To clarify: StableCrateId is a 64-bit hash.
michaelwoerister commentedon Aug 14, 2024
Given 64-bit hashes, one would get a collision probability of 10-15 when having 190 versions of the same crate in the crate graph (according to the table here).
RalfJung commentedon Aug 14, 2024
Turns out there are actually two levels of hashes here, with a theoretical chance of collision in each level: cargo hashes some stuff into
-Cmetadata
, and then rustc hashes that together with other stuff intoStableCrateId
.I don't think that changes anything about the probabilities, but it seems to make it harder to actually check for collisions, since we'd need the original data cargo hashes together to ensure it is all globally unique.
michaelwoerister commentedon Aug 14, 2024
Cargo could probably switch to using 256 bit cryptographic hashes to reduce the concern here.
RalfJung commentedon Aug 14, 2024
The "I believe" is giving me pause, would be good to have that checked. :)
So the crate name goes into
StableCrateId
but is then also separately repeated in the symbol name?Also, v0 symbol mangling is still not stable...
bjorn3 commentedon Aug 14, 2024
Yes. The demangled version of a crate reference in the v0 symbol mangling scheme is
crate_name[crate_disambiguator]
where the crate disambiguator happens to be theStableCrateId
in the current rustc version, though any byte sequence unique between different crates is allowed by the specification.It is stable, but not the default. You can enable it using
-Csymbol-mangling-version=v0
.michaelwoerister commentedon Aug 15, 2024
We could probably harden v0 symbol names against collisions by adding a single (but wide) hash to each symbol that includes more information about the all crate-ids occurring in the name. So instead of
we could have something like
where
@ {256 bit hash}
is some kind of suffix on the symbol name that includes the full information that would have gone into computing theStableCrateId
of bothmy_crate
andmy_other_crate
. Assuming that we trust a 256 bit cryptographic hash to have truly negligible collision risk, that might solve the problem (but would not be cheap in terms of symbol name length as encoding the hash would take 42 bytes).Some other kind of means that does not rely on hashing at all might be preferable. I don't know if something like
rust_checked_dlopen
is feasible.bjorn3 commentedon Aug 15, 2024
That would actually break the existing collision detection for the non-dlopen case. It also means if you have two versions of the same crate, you can no longer know which crate was used after demangling without parsing the crate metadata of all crates and trying every combination of StableCrateId to see if it matches the given combined hash.
michaelwoerister commentedon Aug 15, 2024
In what way?
Yes, that's true. But that information is already pretty opaque, right?
bjorn3 commentedon Aug 15, 2024
Rustc is guaranteed to error when
StableCrateId
collide if this collision occurs when both crates with the sameStableCrateId
are used as (indirect) dependencies by a crate. When you theStableCrateId
in the symbol names together, this may result in a collision even if noStableCrateId
collides.In the case of two
my_crate[{64-bit hash}]::foo<my_other_crate[{64-bit hash}]::Bar>
with distinct hashes, you can distinguishmy_crate
being a different version frommy_other_crate
being a different version. As well as check ifmy_crate[{64-bit hash}]::foo
andmy_crate[{64-bit hash}]::bar
are from the same crate. And finally you can userustc -Zls=root
to read the disambiguator from the rlib/rmeta file of a crate.RalfJung commentedon Aug 15, 2024
If you do that with a cryptographic hash, the chances of a collision are astronomically low so we may disregard that possibility. Like, there's not a single known example of a collision for SHA256 or other comparable hashes.
michaelwoerister commentedon Aug 15, 2024
Actually, the proposal would be something like:
StableCrateId
in crate metadata, so it is accessible downstream.There would only be one level of hashing (disregarding that Cargo feeds hashes into
-Cmetadata
) and that would be a cryptographic hash. So I think the risk of collisions should be very low.This could be mitigated by using short prefixes of the
StableCrateId
like git does for commit hashes. But it's not a perfect solution as it might lead to "phantom collisions" that don't affect correctness but are misleading to humans.chorman0773 commentedon Aug 15, 2024
I think that dylibs are just a pain in the ass in general for opsem - so are rlibs to an extent, because you can "guess" (or determine) the mangled name of a symbol and deliberately produce a function with that
export_name
.rustc could probably fix at least some here by using
STV_PROTECTED
. I don't know what opsem can do here. DSOs are a mess everywhere.bjorn3 commentedon Aug 15, 2024
I agree we should use protected visibility for non-
#[no_mangle]
symbols. We already have rust-lang/rust#105518 open as issue for that. I tried to implement it once, but got linker errors on arm64.