Re: [RESEND PATCH] driver core: Clear FWNODE_FLAG_LINKS_ADDED in device_links_purge()

From: Saravana Kannan
Date: Mon Oct 02 2023 - 17:41:33 EST


On Tue, Sep 26, 2023 at 7:31 PM Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx> wrote:
>
> Flag FWNODE_FLAG_LINKS_ADDED stops fwnode links creation. Current kernel
> only adds it once after fwnode links creation in fw_devlink_parse_fwnode().
> After that even device links being purged, the flag will not be cleared.
>
> Fwnode links are converted to device links and will not be added back
> forever in normal case. Essentially if a device is registered and
> unregisted (also deleted) before it is probed (due to missing fwlink
> dependencies, abort in device_links_check_suppliers), the fwlink is not
> setup next when device is newly created again. This means the probe gets
> called without meeting all dependencies.

I know exactly what problem you are explaining, but it's low in my
list of TODOs because in almost all cases, it's a poorly written
driver.

> It usuallly happens in the case of a glue driver. Of_platform_populate()
> allows us to populate subnodes. We may do it in ancestor node probing
> function, then check subnode's probing status because there may be chances
> that suppliers of subnode are not ready. We may further need to do
> of_platform_depopulate(which purges device links) and in some time
> of_platform_populate() again. Such case we miss fwnode links(so that device
> links) during second time of populating subnodes.

Why is the device driver for the parent device adding the child
devices before it knows it can finish probing? Adding the child
devices should be the last thing you do as part of your probe.

> Fix it by Clearing FWNODE_FLAG_LINKS_ADDED flag in purging device link
> func, indicates both fwnode links and device links are absent.
>

NACK for this patch.

While the intent is good, the change is very wrong because it's
clearing the flags too soon and can cause a lot of busy work.

Also, this issue hasn't been fixed yet because fixing this will cause
issues in some other corner cases related to network devices. I need a
bit more elaborate set of fixes before I can fix this issue you are
raising. But as I said before, in 90% of the cases it's bad driver
behavior.

Thanks,
Saravana


> Signed-off-by: Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx>
> ---
> drivers/base/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b7d7f41..2a1975d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1630,6 +1630,10 @@ static void device_links_purge(struct device *dev)
> __device_link_del(&link->kref);
> }
>
> + /* Clear flags in fwnode. Give a chance to create fwnode link again */
> + if (dev->fwnode)
> + dev->fwnode->flags &= ~FWNODE_FLAG_LINKS_ADDED;
> +
> device_links_write_unlock();
> }
>
> --
> 2.7.4
>