Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

From: Nuno Sá
Date: Tue Feb 27 2024 - 14:14:53 EST


Hi Herve,

On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote:
> Hi Nuno,
>
> On Tue, 27 Feb 2024 17:55:07 +0100
> Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
>
> > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> > > Hi Saravana, Luca, Nuno,
> > >
> > > On Tue, 20 Feb 2024 16:37:05 -0800
> > > Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > >
> > > ...
> > >  
> > > > >
> > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > > > index a9a292d6d59b..5c5f808b163e 100644
> > > > > --- a/drivers/of/overlay.c
> > > > > +++ b/drivers/of/overlay.c
> > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > > > >                 goto out;
> > > > >         }
> > > > >
> > > > > +       /*
> > > > > +        * Wait for any ongoing device link removals before removing some
> > > > > of
> > > > > +        * nodes
> > > > > +        */
> > > > > +       device_link_wait_removal();
> > > > > +   
> > > >
> > > > Nuno in his patch[1] had this "wait" happen inside
> > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > > > the issue that Luca reported[2] in this patch series. Is there any
> > > > problem with doing that?
> > > >
> > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > > > I don't think that's necessary. 
> > >
> > > I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> > >
> > > I do the device_link_wait_removal() wihout having the of_mutex locked.
> > >
> > > Now, suppose I do the device_link_wait_removal() call with the of_mutex locked.
> > > The following flow is allowed and a deadlock is present.
> > >
> > > of_overlay_remove()
> > >   lock(of_mutex)
> > >      device_link_wait_removal()
> > >
> > > And, from the workqueue jobs execution:
> > >   ...
> > >     device_put()
> > >       some_driver->remove()
> > >         of_overlay_remove() <--- The job will never end.
> > >                                  It is waiting for of_mutex.
> > >                                  Deadlock
> > >  
> >
> > We may need some input from Saravana (and others) on this. I might be missing
> > something but can a put_device() lead into a driver remove callback? Driver code
> > is
> > not device code and put_device() leads to device_release() which will either call
> > the
> > device ->release(), ->type->release() or the class ->dev_release() And, IMO,
> > calling
> > of_overlay_remove() or something like that (like something that would lead to
> > unbinding a device from it's driver) in a device release callback would be at the
> > very least very questionable. Typically, what you see in there is of_node_put()
> > and
> > things like kfree() of the device itself or any other data.
>
> I think that calling of_overlay_remove() in a device release callback makes
> sense. The overlay is used to declare sub-nodes from the device node. It
> does not add/remove the device node itself but sub-nodes.
>

I think we are speaking about two different things... device release is not the same
as the driver remove callback. I admit the pci case seems to be a beast of it's own
and I just spent some time (given your links) on it so I can't surely be sure about
what I'm about to say... But, AFAICT, I did not saw any overlay or changeset being
removed from a kobj_type release callback.

> The use case is the use of DT overlays to describe PCI devices.
> https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@xxxxxxx/
> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@xxxxxxxxxxx/
> --- 8< ---
> The lan966x SoCs can be used in two different ways:
>
>  - It can run Linux by itself, on ARM64 cores included in the SoC. This
>    use-case of the lan966x is currently being upstreamed, using a
>    traditional Device Tree representation of the lan996x HW blocks [1]
>    A number of drivers for the different IPs of the SoC have already
>    been merged in upstream Linux.
>
>  - It can be used as a PCIe endpoint, connected to a separate platform
>    that acts as the PCIe root complex. In this case, all the devices
>    that are embedded on this SoC are exposed through PCIe BARs and the
>    ARM64 cores of the SoC are not used. Since this is a PCIe card, it
>    can be plugged on any platform, of any architecture supporting PCIe.
> --- 8< ---
>
> This quite long story led to DT overlay support for PCI devices and so the
> unittest I mentioned:
>   https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946
>
>
> So, I have a PCI driver that bind to the lan966x PCI board.
> This driver loads an overlay at probe() and unload it at remove().
> Also, this driver can be module. A simple rmmod leads to the remove() call.
>

Hmm, and I think that would not be an issue... Note that the code that runs in
device_link_release_fn() is doing put_device() which ends ups on the kobj_type
release callback and so far I could not see any evidence of such a callback being
responsible of calling device_remove() on another device. That would be weird (I
think) since I would expect such a call to happen in a kind of unregister function.

> This driver is not yet upstream because I haven't yet fixed all the issues I
> encountered that's why of now, I can point only the unittest related to overlay
> support for PCI.
>
> >
> > The driver remove callback should be called when unbinding the device from it's
> > drivers and devlinks should also be removed after device_unbind_cleanup() (i.e,
> > after
> > the driver remove callback).
> >
> > Having said the above, the driver core has lots of subtleties so, again, I can be
> > missing something. But at this point I'm still not seeing any deadlock..
> >
>
> I gave a wrong example.
> Based on Luca's sequence he gave in
>   https://lore.kernel.org/all/20231220181627.341e8789@booty/

Regarding Luca's comments, my first approach was actually to just make the devlink
removal synchronously... I'm still not sure what would be the issue of doing that
(other than potentially waiting some time for the srcu synchronization). About the
unlock, I'm just not sure what could happen if someone else (other than us) sneaks in
and grabs the of_mutex while we are in the middle of removing an overlay...

>
> We can have the following:
>
> --- 8< ---
> int of_overlay_remove(int *ovcs_id)
> {
>     ...
>
>     device_link_wait_removal(); // proposed by this patch series
>
>     mutex_lock(&of_mutex);
>
>     ...
>
>     ret = __of_changeset_revert_notify(&ovcs->cset);
>     // this ends up calling (excerpt from a long stack trace):
>     // -> of_i2c_notify
>     // -> device_remove
>     // -> devm_regulator_release
>     // -> device_link_remove
>     // -> devlink_dev_release, which queues work for
>     //      device_link_release_fn, which in turn calls:
>     //      -> device_put
>     //      -> device_release
>     //      -> {platform,regulator,...}_dev*_release
>     //      -> of_node_put() [**]
>
>     ...
>
>     free_overlay_changeset(ovcs);
>     // calls:
>     // -> of_changeset_destroy
>     // -> __of_changeset_entry_destroy
>     // -> pr_err("ERROR: memory leak, expected refcount 1 instead of %d...
>     // The error appears or not, based on when the workqueue runs
>
> err_unlock:
>     mutex_unlock(&of_mutex);
>
>     ...
> }
> --- 8< ---
>
> I think, on your side, you can have something similar.
> I was wrong (sorry for my mistake). the problem is not device_put() but
> device_remove().

But I'm not seeing how device_remove() can deadlock since I'm not sure we can go from
device_link_release_fn() to device_remove(). If there's such a path, then I'll agree
on the deadlock.

>
> In my deadlog example, s/device_put()/device_remove()/
>

Exactly... and that is why my first question was to wonder about put_device() being
able to call any overlay removal code. So, do you know if it's really possible for a
device release callback to end up calling device_remove()? Because, then I could see
the deadlock as device_remove() can end up unbinding the device from it's driver and
hence calling drv->remove().

- Nuno Sá