Re: [PATCH v2 1/2] rust: add abstraction for struct device

From: Danilo Krummrich
Date: Wed Jun 12 2024 - 16:57:03 EST


On Wed, Jun 12, 2024 at 07:13:31PM +0200, Greg KH wrote:
> On Wed, Jun 12, 2024 at 06:18:38PM +0200, Danilo Krummrich wrote:
> > On Wed, Jun 12, 2024 at 05:50:42PM +0200, Greg KH wrote:
> > > On Wed, Jun 12, 2024 at 05:35:21PM +0200, Danilo Krummrich wrote:
> > > > On Wed, Jun 12, 2024 at 05:02:52PM +0200, Greg KH wrote:
> > > > > On Wed, Jun 12, 2024 at 04:51:42PM +0200, Danilo Krummrich wrote:
> > > > > > On 6/11/24 18:13, Boqun Feng wrote:
> > > > > > > On Tue, Jun 11, 2024 at 03:29:22PM +0200, Greg KH wrote:
> > > > > > > > On Tue, Jun 11, 2024 at 03:21:31PM +0200, Danilo Krummrich wrote:
> > > > > > > > > ...hence, I agree we should indeed add to the #Invariants and #Safety section
> > > > > > > > > that `->release` must be callable from any thread.
> > > > > > > > >
> > > > > > > > > However, this is just theory, do we actually have cases where `device::release`
> > > > > > >
> > > > > > > @Danilo, right, it's only theorical, but it's good to call it out since
> > > > > > > it's the requirement for a safe Rust abstraction.
> > > > > >
> > > > > > Similar to my previous reply, if we want to call this out as safety requirement
> > > > > > in `Device::from_raw`, we probably want to add it to the documentation of the C
> > > > > > `struct device`, such that we can argue that this is an invariant of C's
> > > > > > `struct device`.
> > > > > >
> > > > > > Otherwise we'd have to write something like:
> > > > > >
> > > > > > "It must also be ensured that the `->release` function of a `struct device` can
> > > > > > be called from any non-atomic context. While not being officially documented this
> > > > > > is guaranteed by the invariant of `struct device`."
> > > > >
> > > > > In the 20+ years of the driver model being part of the kernel, I don't
> > > > > think this has come up yet, so maybe you can call the release function
> > > > > in irq context. I don't know, I was just guessing :)
> > > >
> > > > Ah, I see. I thought you know and it's defined, but just not documented.
> > > >
> > > > This means it's simply undefined what we expect to happen when the last
> > > > reference of a device is dropped from atomic context.
> > > >
> > > > Now, I understand (and would even expect) that practically this has never been
> > > > an issue. You'd need two circumstances, release() actually does something that
> > > > is not allowed in atomic context plus the last device reference is dropped from
> > > > atomic context - rather unlikely.
> > > >
> > > > >
> > > > > So let's not go adding constraints that we just do not have please.
> > > > > Same goes for the C code, so the rust code is no different here.
> > > >
> > > > I agree we shouldn't add random constraints, but for writing safe code we also
> > > > have to rely on defined behavior.
> > >
> > > As the rust code is relying on C code that could change at any point in
> > > time, how can that ever be "safe"? :)
> >
> > That's the same as with any other API. If the logic of an API is changed the
> > users (e.g a Rust abstraction) of the API have to be adjusted.
>
> Agreed, just like any other in-kernel code, so there shouldn't be
> anything special here.
>
> > > Sorry, this type of definition annoys me.
> > >
> > > > I see two options:
> > > >
> > > > (1) We globally (for struct device) define from which context release() is
> > > > allowed to be called.
> > >
> > > If you want, feel free to do that work please. And then find out how to
> > > enforce it in the driver core.
> >
> > If we *would* define non-atomic context only, we could enforce it with
> > might_sleep() for instance.
>
> might_sleep() isn't always correct from what I remember.
>
> > If we *would* define any context, there is nothing to enforce, but we'd need to
> > validate that no implementer of release() voids that.
>
> Trying to validate that might be hard, again, I don't think it's worth
> it.
>
> > The former is a constaint you don't want to add, the latter a lot of work. What
> > if we at least define that implementers of release() must *minimally* make sure
> > that it can be call from any non-atomic context.
> >
> > That'd be something we can rely on in Rust.
>
> Determining if you are, or are not, in atomic context is almost
> impossible in C, I don't know how you are going to do that at build time
> in Rust. Good luck!

We can't always enforce things, but we can still follow rules even if we can't
enforce them with tools or language features.

In particular, nothing prevents C code to define from which context things are
allowed to be called. When I was working on drm_gpuvm [1] this was one of the
major aspects I had to get right and documented; keyword dma_fence signalling
critical sections.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gpuvm.c

>
> > Oh, I fully agree with that. Let me try to explain a bit what this is about:
> >
> > In Rust we have the `Send` and `Sync` marker traits. If a type (e.g. `Device`)
> > implements `Send` it means that it's safe to pass an instance of this type
> > between threads. Which is clearly something we want to do with a `Device`.
> >
> > If I don't implement `Sync` for `Device` the compiler will prevent me from
> > sending it between threads, e.g. by disallowing me to put an instance of
> > `Device` into another data structure that is potentially passed between threads.
> >
> > If I implement `Sync` I have to add a safety comment on why it is safe to pass
> > `Device` between threads. And here we have what Boqun pointed out: `Device` can
> > only be passed between threads when we're allowed to drop the last reference
> > from any thread. In the case of the kernel this can be any non-atomic context,
> > any context or any other subset. But I have to write something here that is
> > a defined rule and can be relied on.
>
> You really have two things here, a matrix of:
> can transfer between threads
> can call in irq context
> that are independent and not related to each other at all.
>
> Looks like Rust has built in support for the first. And nothing for the
> second as that is a very kernel-specific thing.

The language documentation defines `Send` as "can be transferred between
threads", likely because it's written from a userspace perspective. But in
the kernel context it actually means can be transferred between any context,
thread, IRQ, etc.

If this isn't true, then we have to add a comment what is allowed (e.g. any
non-atomic context) and what's not allowed.

>
> So let's not confuse the two please. `Send` and `Sync` should be fine
> for a device pointer to be passed around, as long as the reference is
> incremented, as that's what all of the kernel C code does today. Let's
> not worry about irq context at all, that's independent and can be
> handled at a later time, if at all, with a different "marking" as it's
> independent of the current two things.

That'd be great, but as mentioned above, we only have `Send`, but nothing like
`SendIrq`, hence `Send` really means any context.

Given your proposal, to just say it's fine to pass between (actual) threads and
ignore IRQ context for now, we have to implement `Send`, but document that IRQ
context is not covered.

We can either do that in the Rust abstraction as safety requirement, or we can,
as proposed previously, add a comment to the C `struct device` documentation
that implementers of release() must *at least* make sure that it can be called
from any non-atomic context. We can then refer to that.

- Danilo

>
> thanks,
>
> greg k-h
>