Description
So this is an issue that came up from #2908. If you have the following headers:
typedef enum State {Working = 1, Failed = 0} State;
takes_state(enum State state);
enum State returns_state();
Runing bindgen --rustified-enum=State
will produce this code:
#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum State {
Working = 1,
Failed = 0,
}
extern "C" {
pub fn takes_state(state: State);
}
extern "C" {
pub fn returns_state() -> State;
}
Sadly, using takes_state
or returns_state
can cause undefined behavior. This is documented behavior but it could easily be improved by adding appropriate conversion methods for State
(which is part of the purpose of #2908). Additionally, bindgen could expose the underlying type of the C enum and use this type instead of State
.
Essentially, I would expect bindgen to generate:
#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum State {
Working = 1,
Failed = 0,
}
pub type State_ctype = ::std::os::raw::c_uint;
// This can be optionally generated (#2908)
impl State {
const unsafe fn from_ctype_unchecked(v: State_ctype) -> Self {
std::mem::transmute(v)
}
}
// This can be optionally generated
impl From<State> for State_ctype {
fn from(value: State) -> Self {
value as Self
}
}
// This can be optionally generated (#2908)
impl TryFrom<State_ctype> for State {
type Err = InvalidState;
fn try_from(value: State_ctype) -> Result<Self, Self::Err> {
match value {
0 => Ok(Self::Working),
1 => Ok(Self::Failed),
_ => Err(InvalidState(value)),
}
}
}
// Here we no longer use `State` but `State_ctype`.
extern "C" {
pub fn takes_state(state: State_ctype);
}
extern "C" {
pub fn returns_state() -> State_ctype;
}
The main advantages of this approach is its flexibility as the user can explicitly opt-in for the conversion methods according to the safety guarantees. If your C library is kind enough to never produce invalid values for an enum, then you can codify that behavior
// Safety: `returns_state` is always safe to call and it is guaranteed to produce a valid `State`.
unsafe { State::from_ctype_unchecked(returns_state())) }
However, if you're unsure, you can deal with this accordingly and expose a Rust function that acknowledges this:
fn returns_state() -> Result<State, <State as TryFrom<State_ctype>>::Err> {
// Safety: `returns_state` is always safe to call
unsafe { bindings::returns_state() }.try_into()
}
The additional improvement over the current behavior is whentakes_state
and returns_state
are used in conjunction:
unsafe { takes_state(returns_state()) }
With the current behavior returns_state
can create an invalid value for State
, which is UB. Then this is passed to takes_state
and "hopefully" all works out. By using the State_ctype
all the unsafety would only happen on the C side.
Activity
ojeda commentedon Dec 10, 2024
Is this #2646, but about changing the current
--rustified-enum
? i.e. is it about changing the existing behavior instead of creating a new kind ofenum
?Cc @tgross35 @y86-dev in case they are interested (from https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Enum.20handling)
bindgen
wanted features & bugfixes Rust-for-Linux/linux#353pvdrz commentedon Dec 10, 2024
this is only about changing
--rustified-enum
so it uses the ctype of the enum instead of the enum in function signatures (and probably everywhere else as well?) and exposes the ctype as a type alias.I think #2646 goes more in the direction of #3050 where you want to generate multiple representations for a single C enum.
pvdrz commentedon Dec 10, 2024
Maybe a reasonable way forward here would be to implement first #3050 and then add an extra feature to the Rust enum representations where the ctype is used instead of the enum itself (
use_ctype
?). So you'd do something likeojeda commentedon Dec 10, 2024
Yeah, I think it should be done everywhere. Otherwise you may get e.g. a type from C that contains the
enum
with an outside value.So, to clarify, #2646 came from that Zulip thread (https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Enum.20handling) which was about generating conversion functions from a raw "C" type (i.e. without UB) to a Rust enum. This came up because in many cases what we want to end up with is an (exhaustive) Rust enum, but we want to remain sound, yet avoid writing manually all the boilerplate code, with all that it entails (e.g. mistakes over time, since C APIs may change at any time within the kernel).
It is true that the issue mentions both
constified_enum
andrustified_enum
, but I think that is because @tgross35 wanted to explain it was kind of a mix between the two, i.e. we want the raw type alias in signatures etc., similar to whatconstified_enum
does, but we want the Rust enums as well, likerustified_enum
, and we want the conversion methods from one to the other, since they can all be autogenerated.So the fundamental idea is about generating all that boilerplate in
bindgen
, including conversion functions, to be able to get Rust enums without UB. Now, that can be done with a "new mode" altogether, or changing how the existing ones behave, or allowing for multiple representations, etc. It is not clear what the best approach is, but perhaps we should centralize the discussion in a single place, i.e. on how to provide a way to solve that core issue.(Please @tgross35 et al correct me if I misremember!)
tgross35 commentedon Dec 10, 2024
I don't know what would be best here. I think the most important thing is that there should be some form that uses Rust enums but uses an integer for everything
extern
. I'm not sure what the feeling is on changing the existingrustified_enum
but it does seem like this would have been a better default, if we could go back in time, considering the current state is a pretty quiet yet severe footgun.From
andTryFrom
aren't strictly necessary but do make things easier to work with. There probably isn't that much need to have theconst
integers (e.g.const foo_one: foo = 1;
that I included in #2646), that was just for convenience. So I think that what is proposed in the top post here is reasonable, either as a fix torustified_enum
or as its own option.One note, it is probably a bit cleaner if the C type is just in the Rust enum's impl block.
y86-dev commentedon Dec 10, 2024
One important part is that we want to exhaustively match on the enum coming from the C side. Because we want to ensure a compilation error when a C developer adds/removes/renames a variant without checking the rust side. Then we get notified "this doesn't build for me!" instead of silently introducing a bug that will haunt us in the future.
pvdrz commentedon Dec 10, 2024
I think inherent associated types are unstable, so
StateCType
would be defined globally. The constants themselves would be fine I think (cc @jbaublitz).I'm curious about the reason to have these raw
C_.*
constants in the first place. Once you have the rust enum you can dovariant as CType
to obtain them. Is there any context where that would be too cumbersome?tgross35 commentedon Dec 10, 2024
You're right :( I thought we had those for a while.
I was originally just thinking for convenience when calling the C methods with constant values, but indeed this doesn't gain much over casting the Rust type.