Re: [PATCH v7 08/16] rust: add devres abstraction

From: Danilo Krummrich
Date: Wed Jan 15 2025 - 01:43:20 EST


On Tue, Jan 14, 2025 at 05:33:45PM +0100, Danilo Krummrich wrote:
> > > +impl<T> Drop for Devres<T> {
> > > + fn drop(&mut self) {
> > > + // Revoke the data, such that it gets dropped already and the actual resource is freed.
> > > + //
> > > + // `DevresInner` has to stay alive until the devres callback has been called. This is
> > > + // necessary since we don't know when `Devres` is dropped and calling
> > > + // `devm_remove_action()` instead could race with `devres_release_all()`.
> >
> > IIUC, the outcome of that race is the `WARN` if
> > devres_release_all takes the spinlock first and has already remvoed the
> > action?
> >
> > Could you do a custom devres_release here that mimick
> > `devm_remove_action` but omit the `WARN`? This way it allows the memory
> > behind DevresInner to be freed early without keeping it allocated until
> > the end of device lifetime.
>
> Better late than never, I now remember what's the *actual* race I was preventing
> here:
>
> | Task 0 | Task 1
> --|----------------------------------------------------------------------------
> 1 | Devres::drop() { | devres_release_all() {
> 2 | DevresInner::remove_action() { | spin_lock_irqsave();
> 3 | | remove_nodes(&todo);
> 4 | | spin_unlock_irqrestore();
> 5 | devm_remove_action_nowarn(); |
> 6 | let _ = Arc::from_raw(); |
> 7 | } |
> 8 | } |
> 9 | | release_nodes(&todo);
> 10| | }
>
> So, if devres_release_all() wins the race it stores the devres action within the
> temporary todo list. Which means that in 9 we enter
> DevresInner::devres_callback() even though DevresInner has been freed already.
>
> Unfortunately, this race can happen with [1], but not with this original version
> of Devres.

Well, actually this can't happen, since obviously we know whether Devres:drop
removed it or whether it will be released by devres_release_all() and we only
free DevresInner conditionally.

So, the code in [1] is fine; I somehow managed to confuse myself.

Sorry for the noise.

- Danilo

>
> I see two ways to fix it:
>
> 1) Just revert [1] and stick with the original version.
>
> 2) Use devm_release_action() instead of devm_remove_action() and don't drop the
> reference in DevresInner::remove_action() (6). This way the reference is
> always dropped from the callback.
>
> With 2) we still have an unnecessary call to revoke() if Devres is dropped
> before the device is detached from the driver, but that's still better than
> keeping DevresInner alive until the device is detached from the driver, which
> the original version did. Hence, I'll go ahead and prepare a patch for this,
> unless there are any concerns.
>
> - Danilo
>
> [1] https://lore.kernel.org/rust-for-linux/20250107122609.8135-2-dakr@xxxxxxxxxx/