Re: [RFC PATCH 02/11] rust: add driver abstraction

From: Danilo Krummrich
Date: Tue Jun 04 2024 - 11:41:36 EST


On Tue, Jun 04, 2024 at 04:27:31PM +0200, Greg KH wrote:
> On Wed, May 22, 2024 at 12:21:53AM +0200, Danilo Krummrich wrote:
> > On Tue, May 21, 2024 at 11:35:43AM +0200, Greg KH wrote:
> > > On Tue, May 21, 2024 at 12:30:37AM +0200, Danilo Krummrich wrote:
> > > > On Mon, May 20, 2024 at 08:14:18PM +0200, Greg KH wrote:
> > > > > On Mon, May 20, 2024 at 07:25:39PM +0200, Danilo Krummrich wrote:
> > > > > > From: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> > > > > >
> > > > > > This defines general functionality related to registering drivers with
> > > > > > their respective subsystems, and registering modules that implement
> > > > > > drivers.
> > > > > >
> > > > > > Co-developed-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > > > > > Signed-off-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > > > > > Co-developed-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> > > > > > Signed-off-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> > > > > > Signed-off-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> > > > > > Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx>
> > > > > > ---
> > > > > > rust/kernel/driver.rs | 492 +++++++++++++++++++++++++++++++++++
> > > > > > rust/kernel/lib.rs | 4 +-
> > > > > > rust/macros/module.rs | 2 +-
> > > > > > samples/rust/rust_minimal.rs | 2 +-
> > > > > > samples/rust/rust_print.rs | 2 +-
> > > > > > 5 files changed, 498 insertions(+), 4 deletions(-)
> > > > > > create mode 100644 rust/kernel/driver.rs
> > > > > >
> > > > > > diff --git a/rust/kernel/driver.rs b/rust/kernel/driver.rs
> > > > > > new file mode 100644
> > > > > > index 000000000000..e0cfc36d47ff
> > > > > > --- /dev/null
> > > > > > +++ b/rust/kernel/driver.rs
> > > > > > @@ -0,0 +1,492 @@
> > > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > > +
> > > > > > +//! Generic support for drivers of different buses (e.g., PCI, Platform, Amba, etc.).
> > > > > > +//!
> > > > > > +//! Each bus/subsystem is expected to implement [`DriverOps`], which allows drivers to register
> > > > > > +//! using the [`Registration`] class.
> > > > >
> > > > > Why are you creating new "names" here? "DriverOps" is part of a 'struct
> > > > > device_driver' why are you separating it out here? And what is
> > > >
> > > > DriverOps is a trait which abstracts a subsystems register() and unregister()
> > > > functions to (un)register drivers. It exists such that a generic Registration
> > > > implementation calls the correct one for the subsystem.
> > > >
> > > > For instance, PCI would implement DriverOps::register() by calling into
> > > > bindings::__pci_register_driver().
> > > >
> > > > We can discuss whether DriverOps is a good name for the trait, but it's not a
> > > > (different) name for something that already exists and already has a name.
> > >
> > > It's a name we don't have in the C code as the design of the driver core
> > > does not need or provide it. It's just the section of 'struct
> > > device_driver' that provides function callbacks, why does it need to be
> > > separate at all?
> >
> > I'm confused by the relationship to `struct device_driver` you seem to imply.
> > How is it related?
> >
> > Again, this is just a trait for subsystems to provide their corresponding
> > register and unregister implementation, e.g. pci_register_driver() and
> > pci_unregister_driver(), such that they can be called from the generic
> > Registration code below.
> >
> > See [1] for an example implementation in PCI.
>
> registering and unregistering drivers belongs in the bus code, NOT in
> the driver code.

Why? We're not (re-)implementing a bus here. Again, those are just abstractions
to call the C functions to register a driver. The corresponding C functions are
e.g. driver_register() or __pci_register_driver(). Those are defined in
drivers/base/driver.c and drivers/pci/pci-driver.c respectively.

Why wouldn't we follow the same scheme in Rust abstractions?

>
> I think lots of the objections I had here will be fixed up when you move
> the bus logic out to it's own file, it does not belong here in a driver
> file (device ids, etc.)
>
> > Please also consider that some structures might be a 1:1 representation of C
> > structures, some C structures are not required at the Rust side at all, and
> > then there might be additional structures and traits that abstract things C has
> > no data structure for.
>
> That's fine, but let's keep the separate of what we have today at the
> very least and not try to lump it all into one file, that makes it
> harder to review and maintain over time.
>
> > > > > 'Registration'? That's a bus/class thing, not a driver thing.
> > > >
> > > > A Registration is an object representation of the driver's registered state.
> > >
> > > And that representation should not ever need to be tracked by the
> > > driver, that's up to the driver core to track that.
> >
> > The driver doesn't need it, the Registration abstraction does need it. Please
> > see my comments below.
>
> Great, put it elsewhere please, it does not belong in driver.rs.

This `Registration` structure is a generic abstraction to call some
$SUBSYSTEM_driver_register() function (e.g. pci_register_driver()). Why would it
belong somewhere else? Again, the corresponding C functions are in some driver.c
file as well.

>
> > > > Having an object representation for that is basically the Rust way to manage the
> > > > lifetime of this state.
> > >
> > > This all should be a static chunk of read-only memory that should never
> > > have a lifetime, why does it need to be managed at all?
> >
> > What I meant here is that if a driver was registered, we need to make sure it's
> > going to be unregistered eventually, e.g. when the module is removed or when
> > something fails after registration and we need to unwind.
> >
> > When the Registration structure goes out of scope, which would happen in both
> > the cases above, it will automatically unregister the driver, due to the
> > automatic call to `drop()`.
>
> That's fine, but again, this all should just be static code, not
> dynamic.

I agree. As already mentioned in another thread, this will be static in v2.

I got the code for that in place already. :)

>
> > > > Once the Registration is dropped, the driver is
> > > > unregistered. For instance, a PCI driver Registration can be bound to a module,
> > > > such that the PCI driver is unregistered when the module is unloaded (the
> > > > Registration is dropped in the module_exit() callback).
> > >
> > > Ok, that's fine, but note that your module_exit() function will never be
> > > called if your module_init() callback fails, so why do you need to track
> > > this state? Again, C drivers never need to track this state, why is
> > > rust requiring more logic here for something that is NOT a dynamic chunk
> > > of memory (or should not be a dynamic chunk of memory, let's get it
> > > correct from the start and not require us to change things later on to
> > > make it more secure).
> >
> > That's fine, if module_init() would fail, the Registration would be dropped as
> > well.
> >
> > As for why doesn't C need this is a good example of what I wrote above. Because
> > it is useful for Rust, but not for C.
> >
> > In Rust we get drop() automatically called when a structure is destroyed. This
> > means that if we let drivers put the Registration structure (e.g. representing
> > that a PCI driver was registered) into its `Module` representation structure
> > (already upstream) then this Registration is automatically destroyed once the
> > module representation is destroyed (which happens on module_exit()). This leads
> > to `drop()` of the `Registration` structure being called, which unregisteres the
> > (e.g. PCI) driver.
> >
> > This way the driver does not need to take care of unregistering the PCI driver
> > explicitly. The driver can also place multiple registrations into the `Module`
> > structure. All of them would be unregistered automatically in module_exit().
>
> Ok, I think we are agreeing here, except that you do not need a "am I
> registered" flag, as the existance of the "object" defines if it is
> registered or not (i.e. if it exists and the "destructor" is called,
> it's been registered, otherwise it hasn't been and the check is
> pointless.)

The static implementation does not need this anymore, since there is no separate
register() function anymore that we need to protect.

>
> > > Again, 'class' means something different here in the driver model, so be
> > > careful with terms, language matters, especially when many of our
> > > developers do not have English as their native language.
> > >
> > > > > > +/// The registration of a driver.
> > > > > > +pub struct Registration<T: DriverOps> {
> > > > > > + is_registered: bool,
> > > > >
> > > > > Why does a driver need to know if it is registered or not? Only the
> > > > > driver core cares about that, please do not expose that, it's racy and
> > > > > should not be relied on.
> > > >
> > > > We need it to ensure we do not try to register the same thing twice
> > >
> > > Your logic in your code is wrong if you attempt to register it twice,
> > > AND the driver core will return an error if you do so, so a driver
> > > should not need to care about this.
> >
> > We want it to be safe, if the driver logic is wrong and registers it twice, we
> > don't want it to blow up.
>
> How could that happen?
>
> > The driver core takes care, but I think there are subsystems that do
> > initializations that could make things blow up when registering the driver
> > twice.
>
> Nope, should not be needed, see above. Rust should make this _easier_
> not harder, than C code here :)

Agree, and as mentioned above, with v2 Rust drivers can't register the same
thing twice anymore.

>
> > > > , some subsystems might just catch fire otherwise.
> > >
> > > Which ones?
> >
> > Let's take the one we provide abstractons for, PCI.
> >
> > In __pci_register_driver() we call spin_lock_init() and INIT_LIST_HEAD() before
> > driver_register() could bail out [1].
> >
> > What if this driver is already registered and in use and we're randomly altering
> > the list pointers or call spin_lock_init() on a spin lock that's currently being
> > held?
>
> I don't understand, why would you ever call "register driver" BEFORE the
> driver was properly set up to actually be registered?
>
> PCI works properly here, you don't register unless everything is set up.
> Which is why it doesn't have a "hey, am I registered or not?" type flag,
> it's not needed.

That's not what I meant, but I think we can drop this specific part of the
discussion anyways, since with v2 we can't hit this anymore. :)

>
> > >
> > > > > > + }
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > +/// Conversion from a device id to a raw device id.
> > > > > > +///
> > > > > > +/// This is meant to be implemented by buses/subsystems so that they can use [`IdTable`] to
> > > > > > +/// guarantee (at compile-time) zero-termination of device id tables provided by drivers.
> > > > > > +///
> > > > > > +/// Originally, RawDeviceId was implemented as a const trait. However, this unstable feature is
> > > > > > +/// broken/gone in 1.73. To work around this, turn IdArray::new() into a macro such that it can use
> > > > > > +/// concrete types (which can still have const associated functions) instead of a trait.
> > > > > > +///
> > > > > > +/// # Safety
> > > > > > +///
> > > > > > +/// Implementers must ensure that:
> > > > > > +/// - [`RawDeviceId::ZERO`] is actually a zeroed-out version of the raw device id.
> > > > > > +/// - [`RawDeviceId::to_rawid`] stores `offset` in the context/data field of the raw device id so
> > > > > > +/// that buses can recover the pointer to the data.
> > > > > > +pub unsafe trait RawDeviceId {
> > > > > > + /// The raw type that holds the device id.
> > > > > > + ///
> > > > > > + /// Id tables created from [`Self`] are going to hold this type in its zero-terminated array.
> > > > > > + type RawType: Copy;
> > > > > > +
> > > > > > + /// A zeroed-out representation of the raw device id.
> > > > > > + ///
> > > > > > + /// Id tables created from [`Self`] use [`Self::ZERO`] as the sentinel to indicate the end of
> > > > > > + /// the table.
> > > > > > + const ZERO: Self::RawType;
> > > > >
> > > > > All busses have their own way of creating "ids" and that is limited to
> > > > > the bus code itself, why is any of this in the rust side? What needs
> > > > > this? A bus will create the id for the devices it manages, and can use
> > > > > it as part of the name it gives the device (but not required), so all of
> > > > > this belongs to the bus, NOT to a driver, or a device.
> > > >
> > > > This is just a generalized interface which can be used by subsystems to
> > > > implement the subsystem / bus specific ID type.
> > >
> > > Please move this all to a different file as it has nothing to do with
> > > the driver core bindings with the include/device/driver.h api.
> >
> > I don't think driver.rs in an unreasonable place for a generic device ID
> > representation that is required by every driver structure.
>
> It has nothing to do with drivers on their own, it's a bus attribute,
> please put that in bus.rs at the least. Or it's own file if you need
> lots of code for simple arrays like this :)

It's not a lot of code actually, probably less than 100 lines, there is a lot of
documentation / examples though. :)

The corresponding C structure definitions are in
include/linux/mod_devicetable.h. Maybe we can move it to a separte device_id.rs
if you prefer that?

>
> > But I'm also not overly resistant to move it out. What do you think would be a
> > good name?
> >
> > Please consider that this name will also be the namespace for this trait.
> > Currently you can reference it with `kernel::driver::RawDeviceId`. If you move
> > it to foo.rs, it'd be `kernel::foo::RawDeviceId`.
>
> I don't see why this isn't just unique to each bus type, it is not a
> driver attribute, but a bus-specific-driver type. Please put it there
> and don't attempt to make it "generic".

If we don't make it generic we end up with a lot of duplicate code in every
subsystem that has some kind of device ID, e.g. OF, PCI, etc. Why would we want
to do that? I think moving the generic abstraction into a separate device_id.rs
seems to be the better option.

>
> thanks,
>
> greg k-h
>