Re: [RFC PATCH 02/11] rust: add driver abstraction
From: Dave Airlie
Date: Tue May 21 2024 - 01:43:26 EST
On Tue, 21 May 2024 at 04:14, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, May 20, 2024 at 07:25:39PM +0200, Danilo Krummrich wrote:
> > From: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> >
> > This defines general functionality related to registering drivers with
> > their respective subsystems, and registering modules that implement
> > drivers.
> >
> > Co-developed-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > Signed-off-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > Co-developed-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> > Signed-off-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> > Signed-off-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> > Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx>
> > ---
> > rust/kernel/driver.rs | 492 +++++++++++++++++++++++++++++++++++
> > rust/kernel/lib.rs | 4 +-
> > rust/macros/module.rs | 2 +-
> > samples/rust/rust_minimal.rs | 2 +-
> > samples/rust/rust_print.rs | 2 +-
> > 5 files changed, 498 insertions(+), 4 deletions(-)
> > create mode 100644 rust/kernel/driver.rs
> >
> > diff --git a/rust/kernel/driver.rs b/rust/kernel/driver.rs
> > new file mode 100644
> > index 000000000000..e0cfc36d47ff
> > --- /dev/null
> > +++ b/rust/kernel/driver.rs
> > @@ -0,0 +1,492 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +//! Generic support for drivers of different buses (e.g., PCI, Platform, Amba, etc.).
> > +//!
> > +//! Each bus/subsystem is expected to implement [`DriverOps`], which allows drivers to register
> > +//! using the [`Registration`] class.
>
> Why are you creating new "names" here? "DriverOps" is part of a 'struct
> device_driver' why are you separating it out here? And what is
> 'Registration'? That's a bus/class thing, not a driver thing.
>
> And be very careful of the use of the word 'class' here, remember, there
> is 'struct class' as part of the driver model :)
>
> > +use crate::{
> > + alloc::{box_ext::BoxExt, flags::*},
> > + error::code::*,
> > + error::Result,
> > + str::CStr,
> > + sync::Arc,
> > + ThisModule,
> > +};
> > +use alloc::boxed::Box;
> > +use core::{cell::UnsafeCell, marker::PhantomData, ops::Deref, pin::Pin};
> > +
> > +/// A subsystem (e.g., PCI, Platform, Amba, etc.) that allows drivers to be written for it.
> > +pub trait DriverOps {
>
> Again, why is this not called DeviceDriver?
This is not the same as the C device_driver, it might mildly align in
concept with it, but I'm not sure it shares enough to align it name
wise with the C one.
> > + /// The type that holds information about the registration. This is typically a struct defined
> > + /// by the C portion of the kernel.
> > + type RegType: Default;
> > +
> > + /// Registers a driver.
> > + ///
> > + /// # Safety
> > + ///
> > + /// `reg` must point to valid, initialised, and writable memory. It may be modified by this
> > + /// function to hold registration state.
> > + ///
> > + /// On success, `reg` must remain pinned and valid until the matching call to
> > + /// [`DriverOps::unregister`].
> > + unsafe fn register(
> > + reg: *mut Self::RegType,
> > + name: &'static CStr,
> > + module: &'static ThisModule,
> > + ) -> Result;
> > +
> > + /// Unregisters a driver previously registered with [`DriverOps::register`].
> > + ///
> > + /// # Safety
> > + ///
> > + /// `reg` must point to valid writable memory, initialised by a previous successful call to
> > + /// [`DriverOps::register`].
> > + unsafe fn unregister(reg: *mut Self::RegType);
> > +}
> > +
> > +/// The registration of a driver.
> > +pub struct Registration<T: DriverOps> {
> > + is_registered: bool,
>
> Why does a driver need to know if it is registered or not? Only the
> driver core cares about that, please do not expose that, it's racy and
> should not be relied on.
>From the C side this does look unusual because on the C side something
like struct pci_driver is statically allocated everywhere.
In this rust abstraction, these are allocated dynamically, so instead
of just it always being safe to just call register/unregister
with static memory, a flag is kept around to say if the unregister
should happen at all, as the memory may have
been allocated but never registered. This is all the Registration is
for, it's tracking the bus _driver structure allocation, and
whether the bus register/unregister have been called since the object
was allocated.
I'm not sure it makes sense (or if you can at all), have a static like
pci_driver object here to match how C does things.
> > +impl<T: DriverOps> Drop for Registration<T> {
> > + fn drop(&mut self) {
> > + if self.is_registered {
> > + // SAFETY: This path only runs if a previous call to `T::register` completed
> > + // successfully.
> > + unsafe { T::unregister(self.concrete_reg.get()) };
>
> Can't the rust code ensure that this isn't run if register didn't
> succeed? Having a boolean feels really wrong here (can't that race?)
There might be a way of using Option<> here but I don't think it adds
anything over and above using an explicit bool.
> > +///
> > +/// This is meant to be implemented by buses/subsystems so that they can use [`IdTable`] to
> > +/// guarantee (at compile-time) zero-termination of device id tables provided by drivers.
> > +///
> > +/// Originally, RawDeviceId was implemented as a const trait. However, this unstable feature is
> > +/// broken/gone in 1.73. To work around this, turn IdArray::new() into a macro such that it can use
> > +/// concrete types (which can still have const associated functions) instead of a trait.
> > +///
> > +/// # Safety
> > +///
> > +/// Implementers must ensure that:
> > +/// - [`RawDeviceId::ZERO`] is actually a zeroed-out version of the raw device id.
> > +/// - [`RawDeviceId::to_rawid`] stores `offset` in the context/data field of the raw device id so
> > +/// that buses can recover the pointer to the data.
> > +pub unsafe trait RawDeviceId {
> > + /// The raw type that holds the device id.
> > + ///
> > + /// Id tables created from [`Self`] are going to hold this type in its zero-terminated array.
> > + type RawType: Copy;
> > +
> > + /// A zeroed-out representation of the raw device id.
> > + ///
> > + /// Id tables created from [`Self`] use [`Self::ZERO`] as the sentinel to indicate the end of
> > + /// the table.
> > + const ZERO: Self::RawType;
>
> All busses have their own way of creating "ids" and that is limited to
> the bus code itself, why is any of this in the rust side? What needs
> this? A bus will create the id for the devices it manages, and can use
> it as part of the name it gives the device (but not required), so all of
> this belongs to the bus, NOT to a driver, or a device.
Consider this a base class (Trait) for bus specific IDs.
> > +}
> > +
> > +// Creates a new ID array. This is a macro so it can take as a parameter the concrete ID type in
> > +// order to call to_rawid() on it, and still remain const. This is necessary until a new
> > +// const_trait_impl implementation lands, since the existing implementation was removed in Rust
> > +// 1.73.
>
> Again, what are these id lists for? Busses work on individual ids that
> they create dynamically.
>
> You aren't thinking this is an id that could be used to match devices to
> drivers, are you? That's VERY bus specific, and also specified already
> in .c code for those busses and passed to userspace. That doesn't
> belong here.
>
> If this isn't that list, what exactly is this?
Rust traits have to be specified somewhere, a RawDeviceId is the basis
for other DeviceId Traits to be implemented on top off, those then
talk to the C side ones.
All the pci and nova code is posted as well for you to work this out,
like you asked for. The use cases are all there to be reviewed, and
should help answer these sorts of questions.
>
> > +#[macro_export]
> > +#[doc(hidden)]
> > +macro_rules! _new_id_array {
> > + (($($args:tt)*), $id_type:ty) => {{
> > + /// Creates a new instance of the array.
> > + ///
> > + /// The contents are derived from the given identifiers and context information.
> > + const fn new< U, const N: usize>(ids: [$id_type; N], infos: [Option<U>; N])
> > + -> $crate::driver::IdArray<$id_type, U, N>
> > + where
> > + $id_type: $crate::driver::RawDeviceId + Copy,
> > + <$id_type as $crate::driver::RawDeviceId>::RawType: Copy + Clone,
> > + {
> > + let mut raw_ids =
> > + [<$id_type as $crate::driver::RawDeviceId>::ZERO; N];
> > + let mut i = 0usize;
> > + while i < N {
> > + let offset: isize = $crate::driver::IdArray::<$id_type, U, N>::get_offset(i);
> > + raw_ids[i] = ids[i].to_rawid(offset);
> > + i += 1;
> > + }
> > +
> > + // SAFETY: We are passing valid arguments computed with the correct offsets.
> > + unsafe {
> > + $crate::driver::IdArray::<$id_type, U, N>::new(raw_ids, infos)
> > + }
> > + }
> > +
> > + new($($args)*)
> > + }}
> > +}
> > +
> > +/// A device id table.
> > +///
> > +/// The table is guaranteed to be zero-terminated and to be followed by an array of context data of
> > +/// type `Option<U>`.
> > +pub struct IdTable<'a, T: RawDeviceId, U> {
> > + first: &'a T::RawType,
> > + _p: PhantomData<&'a U>,
> > +}
>
> All busses have different ways of matching drivers to devices, and they
> might be called a "device id table" and it might not. The driver core
> doesn't care, and neither should this rust code. That's all
> bus-specific and unique to each and every one of them. None of this
> should be needed here at all.
This is just a base Trait, for the bus specifics to implement.
Consider it a bunch of C macros or a bunch of C++ vfuncs, whatever
mental model helps more.
> Shouldn't this go in some common header somewhere? Why is this here?
Again to reiterate what Danilo said, rust has no header files. Things
that in C we would put in header files (macros, struct definitions,
inlines) go into rust code and other rust code references it.
Now there might be some sense in splitting things further into
separate rust files if this concept is going to confuse people, and
what is "core" device code, and what is "template" device code. But
I'm not sure how much sense that makes really.
Maybe the device table stuff should be in a separate file?
> > +/// Custom code within device removal.
>
> You better define the heck out of "device removal" as specified last
> time this all came up. From what I can see here, this is totally wrong
> and confusing and will be a mess.
>
> Do it right, name it properly.
>
> I'm not reviewingn beyond here, sorry. It's the merge window and I
> shouldn't have even looked at this until next week anyway.
>
> But I was hoping that the whole long rant I gave last time would be
> addressed at least a little bit. I don't see that it has :(
I won't comment too much on the specifics here, but you've twice got
to this stage, said something is wrong, change it, and given no
actionable feedback on what is wrong, or what to change.
I've looked at this code for a few hours today with the device docs
and core code and can't spot what you are seeing that is wrong here,
which means I don't expect anyone else is going to unless you can help
educate us more.
Dave.