Re: [PATCH 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals

From: Nuno Sá
Date: Tue Feb 27 2024 - 14:28:55 EST


On Tue, 2024-02-27 at 20:13 +0100, Rafael J. Wysocki wrote:
> On Tue, Feb 27, 2024 at 8:08 PM Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> >
> > Hi Herve,
> >
> > On Tue, 2024-02-27 at 18:54 +0100, Herve Codina wrote:
> > > Hi Nuno,
> > >
> > > On Tue, 27 Feb 2024 17:55:07 +0100
> > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > >
> > > > On Tue, 2024-02-27 at 16:24 +0100, Herve Codina wrote:
> > > > > Hi Saravana, Luca, Nuno,
> > > > >
> > > > > On Tue, 20 Feb 2024 16:37:05 -0800
> > > > > Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > >
> > > > > > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > > > > > index a9a292d6d59b..5c5f808b163e 100644
> > > > > > > --- a/drivers/of/overlay.c
> > > > > > > +++ b/drivers/of/overlay.c
> > > > > > > @@ -1202,6 +1202,12 @@ int of_overlay_remove(int *ovcs_id)
> > > > > > >                 goto out;
> > > > > > >         }
> > > > > > >
> > > > > > > +       /*
> > > > > > > +        * Wait for any ongoing device link removals before removing
> > > > > > > some
> > > > > > > of
> > > > > > > +        * nodes
> > > > > > > +        */
> > > > > > > +       device_link_wait_removal();
> > > > > > > +
> > > > > >
> > > > > > Nuno in his patch[1] had this "wait" happen inside
> > > > > > __of_changeset_entry_destroy(). Which seems to be necessary to not hit
> > > > > > the issue that Luca reported[2] in this patch series. Is there any
> > > > > > problem with doing that?
> > > > > >
> > > > > > Luca for some reason did a unlock/lock(of_mutex) in his test patch and
> > > > > > I don't think that's necessary.
> > > > >
> > > > > I think the unlock/lock in Luca's case and so in Nuno's case is needed.
> > > > >
> > > > > I do the device_link_wait_removal() wihout having the of_mutex locked.
> > > > >
> > > > > Now, suppose I do the device_link_wait_removal() call with the of_mutex
> > > > > locked.
> > > > > The following flow is allowed and a deadlock is present.
> > > > >
> > > > > of_overlay_remove()
> > > > >   lock(of_mutex)
> > > > >      device_link_wait_removal()
> > > > >
> > > > > And, from the workqueue jobs execution:
> > > > >   ...
> > > > >     device_put()
> > > > >       some_driver->remove()
> > > > >         of_overlay_remove() <--- The job will never end.
> > > > >                                  It is waiting for of_mutex.
> > > > >                                  Deadlock
> > > > >
> > > >
> > > > We may need some input from Saravana (and others) on this. I might be missing
> > > > something but can a put_device() lead into a driver remove callback? Driver
> > > > code
> > > > is
> > > > not device code and put_device() leads to device_release() which will either
> > > > call
> > > > the
> > > > device ->release(), ->type->release() or the class ->dev_release(). And, IMO,
> > > > calling
> > > > of_overlay_remove() or something like that (like something that would lead to
> > > > unbinding a device from it's driver) in a device release callback would be at
> > > > the
> > > > very least very questionable. Typically, what you see in there is
> > > > of_node_put()
> > > > and
> > > > things like kfree() of the device itself or any other data.
> > >
> > > I think that calling of_overlay_remove() in a device release callback makes
> > > sense. The overlay is used to declare sub-nodes from the device node. It
> > > does not add/remove the device node itself but sub-nodes.
> > >
> >
> > I think we are speaking about two different things... device release is not the
> > same
> > as the driver remove callback. I admit the pci case seems to be a beast of it's
> > own
> > and I just spent some time (given your links) on it so I can't surely be sure
> > about
> > what I'm about to say... But, AFAICT, I did not saw any overlay or changeset
> > being
> > removed from a kobj_type release callback.
> >
> > > The use case is the use of DT overlays to describe PCI devices.
> > > https://lore.kernel.org/all/1692120000-46900-1-git-send-email-lizhi.hou@xxxxxxx/
> > > https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@xxxxxxxxxxx/
> > > --- 8< ---
> > > The lan966x SoCs can be used in two different ways:
> > >
> > >  - It can run Linux by itself, on ARM64 cores included in the SoC. This
> > >    use-case of the lan966x is currently being upstreamed, using a
> > >    traditional Device Tree representation of the lan996x HW blocks [1]
> > >    A number of drivers for the different IPs of the SoC have already
> > >    been merged in upstream Linux.
> > >
> > >  - It can be used as a PCIe endpoint, connected to a separate platform
> > >    that acts as the PCIe root complex. In this case, all the devices
> > >    that are embedded on this SoC are exposed through PCIe BARs and the
> > >    ARM64 cores of the SoC are not used. Since this is a PCIe card, it
> > >    can be plugged on any platform, of any architecture supporting PCIe.
> > > --- 8< ---
> > >
> > > This quite long story led to DT overlay support for PCI devices and so the
> > > unittest I mentioned:
> > >   https://elixir.bootlin.com/linux/v6.8-rc6/source/drivers/of/unittest.c#L3946
> > >
> > >
> > > So, I have a PCI driver that bind to the lan966x PCI board.
> > > This driver loads an overlay at probe() and unload it at remove().
> > > Also, this driver can be module. A simple rmmod leads to the remove() call.
> > >
> >
> > Hmm, and I think that would not be an issue... Note that the code that runs in
> > device_link_release_fn() is doing put_device() which ends ups on the kobj_type
> > release callback and so far I could not see any evidence of such a callback being
> > responsible of calling device_remove() on another device. That would be weird (I
> > think) since I would expect such a call to happen in a kind of unregister
> > function.
> >
> > > This driver is not yet upstream because I haven't yet fixed all the issues I
> > > encountered that's why of now, I can point only the unittest related to overlay
> > > support for PCI.
> > >
> > > >
> > > > The driver remove callback should be called when unbinding the device from
> > > > it's
> > > > drivers and devlinks should also be removed after device_unbind_cleanup()
> > > > (i.e,
> > > > after
> > > > the driver remove callback).
> > > >
> > > > Having said the above, the driver core has lots of subtleties so, again, I
> > > > can be
> > > > missing something. But at this point I'm still not seeing any deadlock...
> > > >
> > >
> > > I gave a wrong example.
> > > Based on Luca's sequence he gave in
> > >   https://lore.kernel.org/all/20231220181627.341e8789@booty/
> >
> > Regarding Luca's comments, my first approach was actually to just make the
> > devlink
> > removal synchronously... I'm still not sure what would be the issue of doing that
> > (other than potentially waiting some time for the srcu synchronization).
>
> It would allow forward progress to be made, but it would add potential
> delay for everybody, which is only really needed in the DT overlay
> case.  Sounds like something to avoid to me.

I mean, we could surely detect (or have a way to do it) if the devlink is being
removed as part of an overlay removal and only call device_link_release_fn()
synchronously in that case. It would minimize the effects I guess.

But yeah, we still need to make sure there's an actual deadlock... I'm still not
seeing it but Herve may very well be correct about it.

>
> > About the unlock, I'm just not sure what could happen if someone else (other than
> > us) sneaks in
> > and grabs the of_mutex while we are in the middle of removing an overlay...
>
> If that is possible at all, I'd call it a bug.

I think, in theory, it could happen as it looks to be a fairly coarse grained lock
for OF:

https://elixir.bootlin.com/linux/latest/source/drivers/of/of_private.h#L40

- Nuno Sá