Re: [PATCH v2 13/17] driver core: Use device's fwnode to check if it is waiting for suppliers

From: Saravana Kannan
Date: Mon Jun 27 2022 - 18:31:11 EST


On Mon, Jun 27, 2022 at 4:42 AM Abel Vesa <abel.vesa@xxxxxxxxxx> wrote:
>
> On 20-11-20 18:02:28, Saravana Kannan wrote:
> > To check if a device is still waiting for its supplier devices to be
> > added, we used to check if the devices is in a global
> > waiting_for_suppliers list. Since the global list will be deleted in
> > subsequent patches, this patch stops using this check.
> >
> > Instead, this patch uses a more device specific check. It checks if the
> > device's fwnode has any fwnode links that haven't been converted to
> > device links yet.
> >
> > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > ---
> > drivers/base/core.c | 18 ++++++++----------
> > 1 file changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 395dece1c83a..1873cecb0cc4 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(wfs_lock);
> > static LIST_HEAD(deferred_sync);
> > static unsigned int defer_sync_state_count = 1;
> > static DEFINE_MUTEX(fwnode_link_lock);
> > +static bool fw_devlink_is_permissive(void);
> >
> > /**
> > * fwnode_link_add - Create a link between two fwnode_handles.
> > @@ -995,13 +996,13 @@ int device_links_check_suppliers(struct device *dev)
> > * Device waiting for supplier to become available is not allowed to
> > * probe.
> > */
> > - mutex_lock(&wfs_lock);
> > - if (!list_empty(&dev->links.needs_suppliers) &&
> > - dev->links.need_for_probe) {
> > - mutex_unlock(&wfs_lock);
> > + mutex_lock(&fwnode_link_lock);
> > + if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > + !fw_devlink_is_permissive()) {
> > + mutex_unlock(&fwnode_link_lock);
>
> Hi Saravana,
>
> First of, sorry for going back to this.

No worries at all. If there's an issue with fw_devlink, I want to have it fixed.

> There is a scenario where this check will not work and probably should
> work. It goes like this:
>
> A clock controller is not allowed to probe because it uses a clock from a child device of a
> consumer, like so:
>
> dispcc: clock-controller@af00000 {
> clocks = <&dsi0_phy 0>;
> };
>
> mdss: mdss@ae00000 {
> clocks = <&dispcc DISP_CC_MDSS_MDP_CLK>;
>
> dsi0_phy: dsi-phy@ae94400 {
> clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
> };
> };
>
> This is a real scenario actually, but I stripped it down to the essentials.

I'm well aware of this scenario and explicitly wrote code to address this :)

See this comment in fw_devlink_create_devlink()

/*
* If we can't find the supplier device from its fwnode, it might be
* due to a cyclic dependency between fwnodes. Some of these cycles can
* be broken by applying logic. Check for these types of cycles and
* break them so that devices in the cycle probe properly.
*
* If the supplier's parent is dependent on the consumer, then the
* consumer and supplier have a cyclic dependency. Since fw_devlink
* can't tell which of the inferred dependencies are incorrect, don't
* enforce probe ordering between any of the devices in this cyclic
* dependency. Do this by relaxing all the fw_devlink device links in
* this cycle and by treating the fwnode link between the consumer and
* the supplier as an invalid dependency.
*/

Applying this comment to your example, dispcc is the "consumer",
dsi0_phy is the "supplier" and mdss is the "supplier's parent".

And because we can't guarantee the order of addition of these top
level devices is why I also have this piece of recursive call inside
__fw_devlink_link_to_suppliers():

/*
* If a device link was successfully created to a supplier, we
* now need to try and link the supplier to all its suppliers.
*
* This is needed to detect and delete false dependencies in
* fwnode links that haven't been converted to a device link
* yet. See comments in fw_devlink_create_devlink() for more
* details on the false dependency.
*
* Without deleting these false dependencies, some devices will
* never probe because they'll keep waiting for their false
* dependency fwnode links to be converted to device links.
*/
sup_dev = get_dev_from_fwnode(sup);
__fw_devlink_link_to_suppliers(sup_dev, sup_dev->fwnode);
put_device(sup_dev);

So when mdss gets added, we'll link it to dispcc and then check if
dispcc has any suppliers it needs to link to. And that's when the
logic will catch the cycle and fix it.

Can you tell me why this wouldn't unblock the probing of dispcc? Are
you actually hitting this on a device? If so, can you please check why
this logic isn't sufficient to catch and undo the cycle?

Thanks,
Saravana

> So, the dsi0_phy will be "device_add'ed" (through of_platform_populate) by the mdss probe.
> The mdss will probe defer waiting for the DISP_CC_MDSS_MDP_CLK, while
> the dispcc will probe defer waiting for the dsi0_phy (supplier).
>
> Basically, this 'supplier availability check' does not work when a supplier might
> be populated by a consumer of the device that is currently trying to probe.
>
>
> Abel
>
>
> > return -EPROBE_DEFER;
> > }
> > - mutex_unlock(&wfs_lock);
> > + mutex_unlock(&fwnode_link_lock);
> >
> > device_links_write_lock();
> >
> > @@ -1167,10 +1168,7 @@ static ssize_t waiting_for_supplier_show(struct device *dev,
> > bool val;
> >
> > device_lock(dev);
> > - mutex_lock(&wfs_lock);
> > - val = !list_empty(&dev->links.needs_suppliers)
> > - && dev->links.need_for_probe;
> > - mutex_unlock(&wfs_lock);
> > + val = !list_empty(&dev->fwnode->suppliers);
> > device_unlock(dev);
> > return sysfs_emit(buf, "%u\n", val);
> > }
> > @@ -2202,7 +2200,7 @@ static int device_add_attrs(struct device *dev)
> > goto err_remove_dev_groups;
> > }
> >
> > - if (fw_devlink_flags && !fw_devlink_is_permissive()) {
> > + if (fw_devlink_flags && !fw_devlink_is_permissive() && dev->fwnode) {
> > error = device_create_file(dev, &dev_attr_waiting_for_supplier);
> > if (error)
> > goto err_remove_dev_online;
> > --
> > 2.29.2.454.gaff20da3a2-goog
> >
> >