Re: [PATCH v2 2/3] driver core: fw_devlink: Handle missing drivers for optional suppliers

From: Saravana Kannan
Date: Tue Feb 02 2021 - 14:51:25 EST


On Tue, Feb 2, 2021 at 6:34 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Tue, Feb 2, 2021 at 5:33 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> >
> > After a deferred probe attempt has exhaused all the devices that can be
> > bound, any device that remains unbound has one/both of these conditions
> > true:
> >
> > (1) It is waiting on its supplier to bind
> > (2) It does not have a matching driver
> >
> > So, to make fw_devlink=on more forgiving of missing drivers for optional
> > suppliers, after we've done a full deferred probe attempt, this patch
> > deletes all device links created by fw_devlink where the supplier hasn't
> > probed yet and the supplier itself is not waiting on any of its
> > suppliers. This allows consumers to probe during another deferred probe
> > attempt if they were waiting on optional suppliers.
> >
> > When modules are enabled, we can't differentiate between a driver
> > that'll never be registered vs a driver that'll be registered soon by
> > loading a module. So, this patch doesn't do anything for the case where
> > modules are enabled.
> >
> > Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
> > Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
> > ---
> > drivers/base/base.h | 2 +
> > drivers/base/core.c | 104 ++++++++++++++++++++++++++++++++++++--------
> > drivers/base/dd.c | 5 +++
> > 3 files changed, 94 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > index f5600a83124f..34befe9475cb 100644
> > --- a/drivers/base/base.h
> > +++ b/drivers/base/base.h
> > @@ -186,6 +186,8 @@ extern void device_links_no_driver(struct device *dev);
> > extern bool device_links_busy(struct device *dev);
> > extern void device_links_unbind_consumers(struct device *dev);
> >
> > +bool fw_devlink_deferred_probe_retry(void);
> > +
> > /* device pm support */
> > void device_pm_move_to_tail(struct device *dev);
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index c95b1daabac7..5e53fc6a21ea 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -50,6 +50,7 @@ static LIST_HEAD(deferred_sync);
> > static unsigned int defer_sync_state_count = 1;
> > static DEFINE_MUTEX(fwnode_link_lock);
> > static bool fw_devlink_is_permissive(void);
> > +static bool fw_devlink_def_probe_retry;
> >
> > /**
> > * fwnode_link_add - Create a link between two fwnode_handles.
> > @@ -881,6 +882,13 @@ static void device_link_put_kref(struct device_link *link)
> > WARN(1, "Unable to drop a managed device link reference\n");
> > }
> >
> > +static void device_link_drop_managed(struct device_link *link)
> > +{
> > + link->flags &= ~DL_FLAG_MANAGED;
> > + WRITE_ONCE(link->status, DL_STATE_NONE);
> > + kref_put(&link->kref, __device_link_del);
> > +}
> > +
> > /**
> > * device_link_del - Delete a stateless link between two devices.
> > * @link: Device link to delete.
> > @@ -943,6 +951,29 @@ static void device_links_missing_supplier(struct device *dev)
> > }
> > }
> >
> > +/**
> > + * device_links_probe_blocked_by - Return first supplier blocking probe
> > + * @dev: Consumer device.
> > + *
> > + * Checks if the probe of @dev is blocked by a supplier without a driver. If
> > + * yes, return that supplier dev. Otherwise, return NULL.
> > + */
> > +static struct device *device_links_probe_blocked_by(struct device *dev)
> > +{
> > + struct device_link *link;
> > +
> > + list_for_each_entry(link, &dev->links.suppliers, c_node) {
> > + if (!(link->flags & DL_FLAG_MANAGED) ||
> > + link->flags & DL_FLAG_SYNC_STATE_ONLY)
> > + continue;
> > +
> > + if (link->status != DL_STATE_AVAILABLE)
> > + return link->supplier;
> > + }
> > +
> > + return NULL;
>
> This is slightly confusing, because you don't actually use the
> returned device pointer, but simply check it against NULL.
>
> AFAICS this function can return bool and I'd call it
> device_links_probe_blocked().

Yeah, I was writing it this way because I had other future uses for
this. But yeah, I'll just make it a bool and change it later when I
need to.

>
> > +}
> > +
> > /**
> > * device_links_check_suppliers - Check presence of supplier drivers.
> > * @dev: Consumer device.
> > @@ -961,7 +992,7 @@ static void device_links_missing_supplier(struct device *dev)
> > */
> > int device_links_check_suppliers(struct device *dev)
> > {
> > - struct device_link *link;
> > + struct device_link *link, *tmp;
> > int ret = 0;
> >
> > /*
> > @@ -982,19 +1013,47 @@ int device_links_check_suppliers(struct device *dev)
> >
> > device_links_write_lock();
> >
> > - list_for_each_entry(link, &dev->links.suppliers, c_node) {
> > + list_for_each_entry_safe(link, tmp, &dev->links.suppliers, c_node) {
> > if (!(link->flags & DL_FLAG_MANAGED))
> > continue;
> >
> > - if (link->status != DL_STATE_AVAILABLE &&
> > - !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
> > - device_links_missing_supplier(dev);
> > - dev_dbg(dev, "probe deferral - supplier %s not ready\n",
> > - dev_name(link->supplier));
> > - ret = -EPROBE_DEFER;
> > - break;
> > +
> > + if (link->status == DL_STATE_AVAILABLE ||
> > + link->flags & DL_FLAG_SYNC_STATE_ONLY) {
> > + WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
> > + continue;
> > + }
> > +
> > + /*
> > + * After a deferred probe attempt has exhaused all the devices
> > + * that can be bound, any device that remains unbound has
> > + * one/both of these conditions true:
> > + *
> > + * (1) It is waiting on its supplier to bind
> > + * (2) It does not have a matching driver
> > + *
> > + * If this device is waiting on a supplier to bind to a driver,
> > + * we make sure condition (1) above is not true for the
> > + * supplier. In which case, condition (2) has to be true for
> > + * the supplier. That is, the supplier doesn't have a matching
> > + * driver.
> > + *
> > + * When we find such a supplier, we delete the device link if
> > + * it was created by fw_devlink. This it to allow the consumer
> > + * to probe in case the supplier is an optional.
> > + */
> > + if (fw_devlink_def_probe_retry &&
>
> I would put a IS_ENABLED(CONFIG_MODULES) check here to let the
> compiler optimize out the code depending on it and make it clear that
> this is a NOP if there are modules.

Will do.

>
> > + link->flags & DL_FLAG_INFERRED &&
> > + !device_links_probe_blocked_by(link->supplier)) {
> > + device_link_drop_managed(link);
> > + continue;
> > }
> > - WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
> > +
> > + device_links_missing_supplier(dev);
> > + dev_dbg(dev, "probe deferral - supplier %s not ready\n",
> > + dev_name(link->supplier));
> > + ret = -EPROBE_DEFER;
> > + break;
> > }
> > dev->links.status = DL_DEV_PROBING;
> >
> > @@ -1132,13 +1191,6 @@ static void __device_links_supplier_defer_sync(struct device *sup)
> > list_add_tail(&sup->links.defer_sync, &deferred_sync);
> > }
> >
> > -static void device_link_drop_managed(struct device_link *link)
> > -{
> > - link->flags &= ~DL_FLAG_MANAGED;
> > - WRITE_ONCE(link->status, DL_STATE_NONE);
> > - kref_put(&link->kref, __device_link_del);
> > -}
> > -
> > static ssize_t waiting_for_supplier_show(struct device *dev,
> > struct device_attribute *attr,
> > char *buf)
> > @@ -1597,6 +1649,24 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup)
> > return ret;
> > }
> >
> > +/** fw_devlink_deferred_probe_retry - Set up fw_devlink for probe retries
>
> Kerneldoc format mistake.

Ack

>
> > + *
> > + * This function requests fw_devlink to set itself up for a deferred probe
> > + * retry. This allows fw_devlink to ignore device links it created to
> > + * suppliers that'll never probe. This is necessary in case some of the
> > + * suppliers are optional and their consumers can probe without them.
> > + *
> > + * Returns true if deferred probe retry is likely to make any difference.
> > + */
> > +bool fw_devlink_deferred_probe_retry(void)
> > +{
> > + if (IS_ENABLED(CONFIG_MODULES))
> > + return false;
>
> To make the above more visible, I'd fold this function into the caller.

I had written it this way because I'm thinking of adding a timeout
heuristic for MODULES in here. I can move it to the caller if you feel
strongly about it.

>
> > +
> > + fw_devlink_def_probe_retry = true;
> > + return fw_devlink_get_flags() && !fw_devlink_is_permissive();
> > +}
> > +
> > /**
> > * fw_devlink_create_devlink - Create a device link from a consumer to fwnode
> > * @con - Consumer device for the device link
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 9179825ff646..11325df2327f 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -317,6 +317,11 @@ static int deferred_probe_initcall(void)
> > driver_deferred_probe_trigger();
> > /* Sort as many dependencies as possible before exiting initcalls */
> > flush_work(&deferred_probe_work);
> > +
> > + if (fw_devlink_deferred_probe_retry()) {
> > + driver_deferred_probe_trigger();
> > + flush_work(&deferred_probe_work);
> > + }
> > initcalls_done = true;
> >
> > /*
> > --
>
> Overall, the "let's do nothing if modules are not enabled" approach is
> a bit disappointing, because what if somebody builds all of the
> drivers needed for boot in and enables modules anyway, for example to
> allow USB drivers to be probed dynamically?

Yeah, I'm disappointed too :( But I'm trying to get it to work for
!MODULES so that we can enable fw_devlink=on by default at least for
!MODULES to make sure drivers don't introduce more issues going
forward. And then I plan to continue working on making it work
correctly for MODULES case too.

Getting fw_devlink=on to work perfectly for MODULES and !MODULES is
not a problem at all. But it needs fixing a bunch of drivers (mostly
simple fixes like setting the right flag, handling deferred probes
correctly, etc), but I'm hitting a catch-22 here. I can't find the
drivers without setting fw_devlink=on by default. But if I did that,
it's going to break a bunch of boards.

What's your thought on leaving fw_devlink=on by default on 5.12 and
fixing drivers as issues are reported? If that's a no, do you have any
other ideas on how to deal with this catch-22?

Thanks,
Saravana