Re: [PATCH v1 0/5] Enable fw_devlink=on by default

From: Rafael J. Wysocki
Date: Thu Feb 11 2021 - 10:29:12 EST


On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
> >
> >
> > On 14/01/2021 16:56, Jon Hunter wrote:
> > >
> > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > >
> > > ...
> > >
> > >>> Yes this is the warning shown here [0] and this is coming from
> > >>> the 'Generic PHY stmmac-0:00' device.
> > >>
> > >> Can you print the supplier and consumer device when this warning is
> > >> happening and let me know? That'd help too. I'm guessing the phy is
> > >> the consumer.
> > >
> > >
> > > Sorry I should have included that. I added a print to dump this on
> > > another build but failed to include here.
> > >
> > > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> > >
> > > The status is the link->status and looks like the supplier is the
> > > gpio controller. I have verified that the gpio controller is probed
> > > before this successfully.
> > >
> > >> So the warning itself isn't a problem -- it's not breaking anything or
> > >> leaking memory or anything like that. But the device link is jumping
> > >> states in an incorrect manner. With enough context of this code (why
> > >> the device_bind_driver() is being called directly instead of going
> > >> through the normal probe path), it should be easy to fix (I'll just
> > >> need to fix up the device link state).
> > >
> > > Correct, the board seems to boot fine, we just get this warning.
> >
> >
> > Have you had chance to look at this further?
>
> Hi Jon,
>
> I finally got around to looking into this. Here's the email[1] that
> describes why it's done this way.
>
> [1] - https://lore.kernel.org/lkml/YCRjmpKjK0pxKTCP@xxxxxxx/
>
> >
> > The following does appear to avoid the warning, but I am not sure if
> > this is the correct thing to do ...
> >
> > index 9179825ff646..095aba84f7c2 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > {
> > int ret;
> >
> > + ret = device_links_check_suppliers(dev);
> > + if (ret)
> > + return ret;
> > +
> > ret = driver_sysfs_add(dev);
> > if (!ret)
> > driver_bound(dev);
>
> So digging deeper into the usage of device_bind_driver and looking at
> [1], it doesn't look like returning an error here is a good option.
> When device_bind_driver() is called, the driver's probe function isn't
> even called. So, there's no way for the driver to even defer probing
> based on any of the suppliers. So, we have a couple of options:
>
> 1. Delete all the links to suppliers that haven't bound.

Or maybe convert them to stateless links? Would that be doable at all?

> We'll still leave the links to active suppliers alone in case it helps with
> suspend/resume correctness.
> 2. Fix the warning to not warn on suppliers that haven't probed if the
> device's driver has no probe function. But this will also need fixing
> up the cleanup part when device_release_driver() is called. Also, I'm
> not sure if device_bind_driver() is ever called when the driver
> actually has a probe() function.
>
> Rafael,
>
> Option 1 above is pretty straightforward.

I would prefer this ->

> Option 2 would look something like what's at the end of this email +
> caveat about whether the probe check is sufficient.

-> because "fix the warning" really means that we haven't got the
device link state machine right and getting it right may imply a major
redesign.

Overall, I'd prefer to take a step back and allow things to stabilize
for a while to let people catch up with this.

> Do you have a preference between Option 1 vs 2? Or do you have some
> other option in mind?
>
> Thanks,
> Saravana
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5481b6940a02..8102b3c48bbc 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1247,7 +1247,8 @@ void device_links_driver_bound(struct device *dev)
> */
> device_link_drop_managed(link);
> } else {
> - WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
> + WARN_ON(link->status != DL_STATE_CONSUMER_PROBE &&
> + dev->driver->probe);
> WRITE_ONCE(link->status, DL_STATE_ACTIVE);
> }
>
> @@ -1302,7 +1303,8 @@ static void __device_links_no_driver(struct device *dev)
> if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) {
> WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
> } else {
> - WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY));
> + WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY) &&
> + dev->driver->probe);
> WRITE_ONCE(link->status, DL_STATE_DORMANT);
> }
> }