Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD

From: Saravana Kannan
Date: Tue Sep 21 2021 - 16:07:58 EST


Sorry I've been busy with LPC and some other stuff and could respond earlier.

On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@xxxxxxx> wrote:
>
> > It works at a device level, so it doesn't know about resources. The
> > only information it has is of the "this device may depend on that
> > other device" type and it uses that information to figure out a usable
> > probe ordering for drivers.
>
> And that simplification is the problem. A phandle does not point to a
> device, it points to a resource of a device. It should really be doing
> what the driver would do, follow the phandle to the resource and see
> if it exists yet. If it does not exist then yes it can defer the
> probe. If the resource does exist, allow the driver to probe.
>
> > Also if the probe has already started, it may still return
> > -EPROBE_DEFER at any time in theory
>
> Sure it can, and does. And any driver which is not broken will
> unregister its resources on the error path. And that causes users of
> the resources to release them. It all nicely unravels, and then tries
> again later. This all works, it is what these drivers do.

One of the points of fw_devlink=on is to avoid the pointless deferred
probes that'd happen in this situation. So saying "let this happen"
when fw_devlink=on kinda beats the point of it. See further below.

>
> > However, making children wait for their parents to complete probing is
> > generally artificial, especially in the cases when the children are
> > registered by the parent's driver. So waiting should be an exception
> > in these cases, not a rule.

Rafael,

There are cases where the children try to probe too quickly (before
the parent has had time to set up all the resources it's setting up)
and the child defers the probe. Even Andrew had an example of that
with some ethernet driver where the deferred probe is attempted
multiple times wasting time and then it eventually succeeds.

Considering there's no guarantee that a device_add() will result in
the device being bound immediately, why shouldn't we make the child
device wait until the parent has completely probed and we know all the
resources from the parent are guaranteed to be available? Why can't we
treat drivers that assume a device will get bound as soon as it's
added as the exception (because we don't guarantee that anyway)?

Also, this assumption that the child will be bound successfully upon
addition forces the parent/child drivers to play initcall chicken --
the child's driver has to be registered before the parent's driver. We
should be getting away from those by fixing the parent driver that's
making these assumptions (I'll be glad to help with that). We need to
be moving towards reducing pointless deferred probes and initcall
ordering requirements instead of saying "this bad assumption used to
work, so allow me to continue doing that".

-Saravana

> So are you suggesting that fw_devlink core needs to change, recognise
> the dependency on a parent, and allow the probe? Works for me. Gets us
> back to before fw_devlink.