Re: C aggregate passing (Rust kernel policy)

From: Ralf Jung
Date: Thu Feb 27 2025 - 08:55:48 EST


Hi all,

...
Unions in C, C++ and Rust (not Rust "enum"/tagged union) are
generally sharp. In Rust, it requires unsafe Rust to read from
a union.

Definitely sharp. At least in Rust we have a very clear specification though,
since we do allow arbitrary type punning -- you "just" reinterpret whatever
bytes are stored in the union, at whatever type you are reading things. There is
also no "active variant" or anything like that, you can use any variant at any
time, as long as the bytes are "valid" for the variant you are using. (So for
instance if you are trying to read a value 0x03 at type `bool`, that is UB.)

That is actually a big f***ing problem.
The language has to define the exact behaviour when 'bool' doesn't contain
0 or 1.

No, it really does not. If you want a variable that can hold all values in 0..256, use `u8`. The entire point of the `bool` type is to represent values that can only ever be `true` or `false`. So the language requires that when you do type-unsafe manipulation of raw bytes, and when you then make the choice of the `bool` type for that code (which you are not forced to!), then you must indeed uphold the guarantees of `bool`: the data must be `0x00` or `0x01`.

Much the same as the function call interface defines whether it is the caller
or called code is responsible for masking the high bits of a register that
contains a 'char' type.

Now the answer could be that 'and' is (or may be) a bit-wise operation.
But that isn't UB, just an undefined/unexpected result.

I've actually no idea if/when current gcc 'sanitises' bool values.
A very old version used to generate really crap code (and I mean REALLY)
because it repeatedly sanitised the values.
But IMHO bool just shouldn't exist, it isn't a hardware type and is actually
expensive to get right.
If you use 'int' with zero meaning false there is pretty much no ambiguity.

We have many types in Rust that are not hardware types. Users can even define them themselves:

enum MyBool { MyFalse, MyTrue }

This is, in fact, one of the entire points of higher-level languages like Rust: to let users define types that represent concepts that are more abstract than what exists in hardware. Hardware would also tell us that `&i32` and `*const i32` are basically the same thing, and yet of course there's a world of a difference between those types in Rust.

Kind regards,
Ralf