Re: [PATCH v3 09/12] of: property: Simplify of_link_to_phandle()

From: Geert Uytterhoeven
Date: Mon Feb 13 2023 - 08:11:11 EST


Hi Saravana,

On Wed, Feb 8, 2023 at 9:35 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> On Tue, Feb 7, 2023 at 11:57 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> > > > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > > > > The driver core now:
> > > > > > - Has the parent device of a supplier pick up the consumers if the
> > > > > > supplier never has a device created for it.
> > > > > > - Ignores a supplier if the supplier has no parent device and will never
> > > > > > be probed by a driver
> > > > > >
> > > > > > And already prevents creating a device link with the consumer as a
> > > > > > supplier of a parent.
> > > > > >
> > > > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > > > that the supplier is available in DT.
> > > > > >
> > > > > > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > > > >
> > > > > Thanks for your patch!
> > > > >
> > > > > This patch introduces a regression when dynamically loading DT overlays.
> > > > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > > > which is not supported upstream. Still, there may be (obscure)
> > > > > in-tree users.
> > > > >
> > > > > When loading a DT overlay[1] to enable an SPI controller, and
> > > > > instantiate a connected SPI EEPROM:
> >
> > [...]
> >
> > > > > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> > > > Sigh... I spent way too long trying to figure out if I caused a memory
> > > > leak. I should have scrolled down further! Doesn't look like that part
> > > > is related to anything I did.
> > > >
> > > > There are some flags set to avoid re-parsing fwnodes multiple times.
> > > > My guess is that the issue you are seeing has to do with how many of
> > > > the in memory structs are reused vs not when an overlay is
> > > > applied/removed and some of these flags might not be getting cleared
> > > > and this is having a bigger impact with this patch (because the fwnode
> > > > links are no longer anchored on "compatible" nodes).
> > > >
> > > > With/without this patch (let's keep the series) can you look at how
> > > > the following things change between each step you do above (add,
> > > > remove, retry):
> > > > 1) List of directories under /sys/class/devlink
> > > > 2) Enable the debug logs inside __fwnode_link_add(),
> > > > __fwnode_link_del(), device_link_add()
> > > >
> > > > My guess is that the final solution would entail clearing
> > > > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
> > >
> > > You replied just as I was about to hit send. So sending this anyway...
> > >
> > > Ok, I took a closer look and I think it's a bit of a mess. The fact
> > > that it even worked for you without this patch is a bit of a
> > > coincidence.
> > >
> > > Let's just take platform devices that are created by
> > > driver/of/platform.c as an example.
> > >
> > > The main problem is that when you add/remove properties to a DT node
> > > of an existing platform device, nothing is really done about it at the
> > > device level. We don't even unbind and rebind the driver so the driver
> > > could make use of the new properties. We don't remove and add back the
> > > device so whoever might use the new property will use it. And if you
> > > are adding a new node, it'll only trigger any platform device level
> > > impact if it's a new node of a "simple-bus" (or similar bus) device.
> > >
> > > Problem 1:
> > > So if you add a new child node to an existing probed device that adds
> > > its children explicitly (as in, the parent is not a "simple-bus" like
> > > device), nothing will happen. The newly added child device node will
> > > get converted into a platform device, not will the parent device
> > > notice it. So in your case of adding msiof0_pins, it's just that when
> > > the consumer gets the pins, the driver doesn't get involved much and
> > > it's the pinctrl framework that reads the DT and figures it out.
> > >
> > > With this patch, the fwnode links point to the actual resource and the
> > > actual parent device inherits them if they don't get converted to a
> > > struct device. But since we are adding this msiof0_pins after the
> > > parent device has probed, the fwnode link isn't inherited by the
> > > parent pinctrl device.
> > >
> > > Problem 2:
> > > So if you add a property to an already bound device, nothing is done
> > > by the driver. In your overlay example, if you move the status="okay"
> > > line to be the first property you change in the msiof0 spi device,
> > > you'll probably see that fw_devlink is no longer the one blocking the
> > > probe. This is because the platform device will get added as soon as
> > > the status flips from disabled to enabled and at that point fw_devlink
> > > will think it has no suppliers and won't do any probe deferring. And
> > > then as the new properties get added nothing will happen at the device
> > > or fw_devlink level. If the msiof0's spi driver fails immediately with
> > > NOT -EPROBE_DEFER when platform device is added because it couldn't
> > > find any pinctrl property, then msiof0 will never probe (unless you
> > > remove and add the driver). If it had failed with -EPROBE_DEFER, then
> > > it might probe again if something else triggers a deferred probe
> > > attempt. Clearly, things working/not working based on the order of
> > > properties in DT is not a good implementation.
> > >
> > > Problem 3:
> > > What if you enable a previously disabled supplier. There's no way to
> > > handle that from a fw_devlink level without re-parsing the entire
> > > device tree because existing devices might be consumers now.
> > >
> > > Anyway, long story short, it's sorta worked due to coincidence and
> > > it's quite messy to get it to work correctly.
> >
> > Several subsystems register notifiers to be informed of the events
> > above. E.g. drivers/spi/spi.c:
> >
> > if (IS_ENABLED(CONFIG_OF_DYNAMIC))
> > WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
> > if (IS_ENABLED(CONFIG_ACPI))
> > WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));
> >
> > So my issue might be triggered using ACPI, too.
>
> Yeah, I did notice this before my email. Here's an ugly hack (at end
> of email) to test my theory about Problem 1. I didn't compile test it
> (because I should go to bed now), but you get the idea. Can you give
> this a shot? It should fix your specific case. Basically for all
> overlays (I hope the function is only used for overlays) we assume all
> nodes are NOT devices until they actually get added as a device. Don't
> review the code, it's not meant to be :)
>
> -Saravana
>
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -226,6 +226,7 @@ static void __of_attach_node(struct device_node *np)
> np->sibling = np->parent->child;
> np->parent->child = np;
> of_node_clear_flag(np, OF_DETACHED);
> + np->fwnode.flags |= FWNODE_FLAG_NOT_DEVICE;
> }
>
> /**
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 81c8c227ab6b..7299cd668e51 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -732,6 +732,7 @@ static int of_platform_notify(struct notifier_block *nb,
> if (of_node_check_flag(rd->dn, OF_POPULATED))
> return NOTIFY_OK;
>
> + rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
> /* pdev_parent may be NULL when no bus platform device */
> pdev_parent = of_find_device_by_node(rd->dn->parent);
> pdev = of_platform_device_create(rd->dn, NULL,
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 15f174f4e056..1de55561b25d 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -4436,6 +4436,7 @@ static int of_spi_notify(struct notifier_block
> *nb, unsigned long action,
> return NOTIFY_OK;
> }
>
> + rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
> spi = of_register_spi_device(ctlr, rd->dn);
> put_device(&ctlr->dev);

Thanks, these changes fix my SPI EEPROM in a DT overlay.
A similar change should be applied to the i2c bus core (and to other
users of of_reconfig_notifier_register()?).

For reference, the same debug output and /sys/class/devlink
changes with this fix applied can be found below.

Note that there are still a few remaining issues, for which I do not
know the full impact:
- platform:e6060000.pinctrl--platform:keys link is not recreated
on overlay remove,
- There is no change in /sys/class/devlink after an add/remove/add
cycle.
Shouldn't removing a DT overlay restore /sys/class/devlink to
the exact same state as before adding the DT overlay?

With extra FWNODE_FLAG_NOT_DEVICE handling:

- Adding overlay:

spi@e6e90000 Linked as a fwnode consumer to
interrupt-controller@f1010000
spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
spi@e6e90000 Linked as a fwnode consumer to msiof0
spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6055000.gpio
spi@e6e90000 Dropping the fwnode link to gpio@e6055000
platform e6e90000.spi: Linked as a consumer to e6060000.pinctrl
spi@e6e90000 Dropping the fwnode link to msiof0
spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000
platform e6e90000.spi: Linked as a consumer to soc
spi@e6e90000 Dropping the fwnode link to interrupt-controller@f1010000

+platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
+platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
+platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
+platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi
-platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

SPI EEPROM works

- Removing overlay:

platform keys: Linked as a sync state only consumer to e6055000.gpio

-platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
-platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
-platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
-platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?

- Adding overlay again:

No debug output
No change in sys/class/devlink?!?!?
SPI EEPROM works

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds