Re: [PATCH v20 2/5] usb: dwc3: core: Host wake up support from system suspend
From: Pavan Kondeti
Date: Fri Jun 24 2022 - 04:59:10 EST
On Thu, Jun 23, 2022 at 11:38:06AM -0700, Matthias Kaehlcke wrote:
> On Mon, Jun 20, 2022 at 02:24:15PM +0530, Pavan Kondeti wrote:
> > +Felipe, Bjorn
> >
> > On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote:
> > > On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote:
> > > > Hi Matthias/Krishna,
> > > >
> > > > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote:
> > > > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> > > > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > > > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > > > > > > Hi Krishna,
> > > > > > > >
> > > > > > > > with this version I see xHCI errors on my SC7180 based system, like
> > > > > > > > these:
> > > > > > > >
> > > > > > > > [ 65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > > > > > >
> > > > > > > > [ 101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > > > > > >
> > > > > > > > After resume a downstream hub isn't enumerated again.
> > > > > > > >
> > > > > > > > So far I didn't see those with v13, but I aso saw the first error with
> > > > > > > > v16.
> > > > > > >
> > > > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > > > > > > device is plugged in. Initially I used a wakeup capable USB3 to
> > > > > > > Ethernet adapter to trigger the wakeup case, however older versions
> > > > > > > of this series that use usb_wakeup_enabled_descendants() to check
> > > > > > > for wakeup capable devices didn't actually check for vUSB > 2
> > > > > > > devices.
> > > > > > >
> > > > > > > So the case were the controller/PHYs is powered down works, but
> > > > > > > the controller is unhappy when the runtime PM path is used during
> > > > > > > system suspend.
> > > > > >
> > > > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts
> > > > > > during probe(). The expected probe sequence is something like this:
> > > > > >
> > > > > > dwc3_qcom_probe
> > > > > > dwc3_qcom_of_register_core
> > > > > > dwc3_probe
> > > > > >
> > > > > > if (device_can_wakeup(&qcom->dwc3->dev))
> > > > > > ...
> > > > > >
> > > > > > The important part is that device_can_wakeup() is called after dwc3_probe()
> > > > > > has completed. That's what I see on a QC SC7280 system, where wakeup is
> > > > > > generally working with these patches.
> > > > > >
> > > > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> > > > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> > > > > > With that the controller/driver ends up in an unhappy state after system
> > > > > > suspend.
> > > > > >
> > > > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds
> > > > > > that '88e3000.phy' isn't ready yet.
> > > > >
> > > > > It seems device links could be used to make sure the dwc3 core is present:
> > > > >
> > > > > Another example for an inconsistent state would be a device link that
> > > > > represents a driver presence dependency, yet is added from the consumer’s
> > > > > ->probe callback while the supplier hasn’t probed yet: Had the driver core
> > > > > known about the device link earlier, it wouldn’t have probed the consumer
> > > > > in the first place. The onus is thus on the consumer to check presence of
> > > > > the supplier after adding the link, and defer probing on non-presence.
> > > > >
> > > > > https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage
> > > > >
> > > > >
> > > > > You could add something like this to dwc3_qcom_of_register_core():
> > > > >
> > > > >
> > > > > device_link_add(dev, &qcom->dwc3->dev,
> > > > > DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);
> > > > >
> > > > > if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
> > > > > ret = -EPROBE_DEFER;
> > > > >
> > > > >
> > > > I am not very sure how the device_link_add() API works. we are the parent and
> > > > creating a depdency on child probe. That does not sound correct to me.
> > >
> > > The functional dependency is effectively there, the driver already assumes that
> > > the dwc3 core was probed when of_platform_populate() returns.
> > >
> > > The device link itself doesn't create the dependency on the probe(), the check
> > > of the link status below does.
> > >
> > > Another option would be to add a link to the PHYs to the dwc3-qcom node in
> > > the device tree, but I don't think that would be a better solution (and I
> > > expect Rob would oppose this).
> > >
> > > I'm open to other solutions, so far the device link is the cleanest that came
> > > to my mind.
> > >
> > > I think the root issue is the driver architecture, with two interdependent
> > > drivers for the same IP block, instead of a single framework driver with a
> > > common part (dwc3 core) and vendor specific hooks/data.
> > >
> > > > Any ways, I have another question.
> > > >
> > > > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we
> > > > goto depopulate label which calls of_platform_depopulate() which destroy the
> > > > child devices that are populated. how does that ensure that child probe is
> > > > completed by the time, our probe is called again. The child device it self is
> > > > gone. Is this working because when our probe is called next time, the child
> > > > probe depenencies are resolved?
> > >
> > > Good point! It doesn't really ensure that the child is probed (actually it
> > > won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it
> > > could happen that dwc3_qcom_probe() is deferred multiple times, but eventually
> > > the PHYs should be ready and dwc3_probe() be invoked through
> > > of_platform_populate().
> >
> > This is a generic problem i.e if a parent can only proceed after the child
> > devices are bounded (i.e probed successfully), how to ensure this behavior
> > from the parent's probe? Since we can't block the parent probe (async probe is
> > not the default behavior), we have to identify the condition that the children
> > are deferring probe, so that parent also can do that.
> >
> > Can we add a API in drivers core to tell if a device probe is deferred or
> > not? This can be done by testing list_empty(&dev->p->deferred_probe) under
> > deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this
> > API return value.
>
> That could be an option.
>
> > Another alternative would be explicitly checking if the child device suppliers
> > are ready or not before adding child device. That would require decoupling
> > of_platform_populate() to creating devices and adding devices.
>
> It might require a new API since there are plenty of users of
> of_platform_populate() that rely on the current behavior.
Agree. A new API is needed. we also have to consider multiple children
scenario, where one child is ready but others are not etc.
>
> > Note that this problem is not just limited to suppliers not ready. if the
> > dwc3-qcom is made asynchronous probe, then its child also probed
> > asynchronously and there is no guarantee that child would be probed by the
> > time of_platform_populate() is returned. The bus notifier might come handy
> > in this case. The parent can register for this notifier and waiting for
> > the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND
> > notifications. This would also work in our case, if we move to
> > of_platform_populate() outside the probe().
>
> If I understand correctly the outcome would be a probe() in two stages. The
> first does as much as it can do without the dwc3 core and leaves the device
> in a state where it isn't really functional, and the second stage does the
> rest when BUS_NOTIFY_BOUND_DRIVER is received for the dwc3 core device.
>
> A concern could be the need for additional conditions in some code paths to
> deal with the half-initialized device.
>
> Why would of_platform_populate() be moved outside of probe()?
>
> To avoid the half-initialized device probe() could block until
> BUS_NOTIFY_BOUND_DRIVER is received. Probably that should be done with a
> timeout to avoid blocking forever in case of a problem with probing the
> dwc3 core.
Right, we have to split the probe() into two parts. The second stage which
runs asynchronously should wait for the dwc3 core probe to happen. if at all
dwc3 core probe is falied (in which case also we get a notification) for any
reason other than EPROBE_DEFER, we have to undo the first stage.
Thansk,
Pavan