Re: [PATCH v4 1/2] dt-bindings: usb: Add binding for discrete onboard USB hubs

From: Matthias Kaehlcke
Date: Wed Oct 07 2020 - 17:42:30 EST


On Wed, Oct 07, 2020 at 04:17:32PM -0400, Alan Stern wrote:
> On Wed, Oct 07, 2020 at 12:42:29PM -0700, Matthias Kaehlcke wrote:
> > On Wed, Oct 07, 2020 at 03:25:42PM -0400, Alan Stern wrote:
> > > On Wed, Oct 07, 2020 at 10:28:47AM -0700, Matthias Kaehlcke wrote:
> > > > On Wed, Oct 07, 2020 at 12:38:38PM -0400, Alan Stern wrote:
> > > > > On Wed, Oct 07, 2020 at 09:03:36AM -0700, Matthias Kaehlcke wrote:
> > > > > > Ok, I wasn't sure if the hubs suspend asynchronously from each other. If they
> > > > > > do it should indeed not be a problem to have the "master" wait for its peers.
> > > > >
> > > > > Well, order of suspending is selectable by the user. It can be either
> > > > > asynchronous or reverse order of device registration, which might pose a
> > > > > problem. We don't know in advance which of two peer hubs will be
> > > > > registered first. It might be necessary to introduce some additional
> > > > > explicit synchronization.
> > > >
> > > > I'm not sure we are understanding each other completely. I agree that
> > > > synchronization is needed to have the primary hub wait for its peers, that
> > > > was one of my initial concerns.
> > > >
> > > > Lets use an example to clarify my secondary concern: a hub chip provides a
> > > > USB 3 and a USB 2 hub, lets say the USB 3 hub is the primary.
> > > >
> > > > Here is some pseudo-code for the suspend function:
> > > >
> > > > hub_suspend(hub)
> > > > ...
> > > >
> > > > if (hub->primary) {
> > > > device_pm_wait_for_dev(hub->peer)
> > > >
> > > > // check for connected devices and turn regulator off
> > > > }
> > > >
> > > > ...
> > > > }
> > > >
> > > > What I meant with 'asynchronous suspend' in this context:
> > > >
> > > > Can hub_suspend() of the peer hub be executed (asynchronously) while the
> > > > primary is blocked on device_pm_wait_for_dev(),
> > >
> > > Yes, that's exactly what would happen with async suspend.
> > >
> > > > or would the primary wait
> > > > forever if the peer hub isn't suspended yet?
> > >
> > > That wouldn't happen. device_pm_wait_for_dev is smart; it will return
> > > immediately if neither device uses async suspend. But in that case you
> > > could end up removing power from the peer hub before it had suspended.
> > >
> > > That's why I said you might need to add additional synchronization. The
> > > suspend routines for the two hubs could each check to see whether the
> > > other device had suspended yet, and the last one would handle the power
> > > regulator. The additional synchronization is for the case where the two
> > > checks end up being concurrent.
> >
> > That was exactly my initial concern and one of the reasons I favor(ed) a
> > platform instead of a USB driver:
>
> Clearly there's a tradeoff.
>
> > > otherwise all hubs need to know their peers and check in suspend if they
> > > are the last hub standing, only then the power can be switched off.
> >
> > To which you replied:
> >
> > > you just need to make the "master" hub wait for its peer to suspend, which
> > > is easy to do.
> >
> > However that apparently only works if async suspend is enabled, and we
> > can't rely on that.
>
> Yes, I had forgotten about the possibility of synchronous suspend. My
> mistake.
>
> > With the peers checking on each other you lose effectively the notion
> > of a primary.
>
> Well, you can still want to put the sysfs power-control attribute file
> into just one of the hubs' directories, and that one would be considered
> the primary. But I agree, it's a weak notion.
>
> > Going back to the binding:
> >
> > &usb_1_dwc3 {
> > hub_2_0: hub@1 {
> > compatible = "usbbda,5411";
> > reg = <1>;
> > };
> >
> > hub_3_0: hub@2 {
> > compatible = "usbbda,411";
> > reg = <2>;
> > vdd-supply = <&pp3300_hub>;
> > companion-hubs = <&hub_2_0>;
> > };
> > };
> >
> > How does 'hub_2_0' know that its peer is hub_3_0 and that it has a regulator
> > (and potentially other resources)?
>
> The peering relation goes both ways, so it should be included in the
> hub_2_0 description too. Given that, the driver could check hub_2_0's
> peer's DT description for the appropriate resources.

That mitigates the issue somewhat, however we still have to convince Rob that
both references are needed.

> > All this mess can be avoided by having a single instance in control of the
> > resources which is guaranteed to suspend after the USB devices.
>
> Yes. At the cost of registering, adding a driver for, and making users
> aware of a fictitious platform device.

Registration is trivial and the driver code will be needed anyway, I'm
pretty convinced that a separate platform driver will be simpler than
plumbing things into the hub driver, with the additional checks of who is
suspended or not, etc. If other resources like resets are involved there
could be further possible race conditions at probe time. Another issue is
the sysfs attribute. We said to attach it to the primary hub. What happens
when the primary hub goes away? I guess we could force unbinding the peers
as we did in the driver under discussion to avoid confusion/inconsistencies,
but it's another tradeoff.

My view of the pros and cons of extending the hub driver vs. having a platform
driver:

- pros
- sysfs attribute is attached to a USB hub device
- no need to register a platform device (trivial)
- potentially more USB awareness (not clear if needed)

- cons
- possible races involving resources between peer hubs during initialization
- increased complexity from keeping track of peers, checking suspend order
and avoiding races
- peers are forced to unbind when primary goes away
- need DT links to peers for all USB hubs, not only in the primary
- pollution of the generic hub code with device specific stuff instead
of keeping it in a self contained driver
- sysfs attribute is attached to only one of the hubs, which is better than
having it on both, but not necessarily better than attaching it to the
platform device with the 'control logic'

So yes, there are tradeoffs, IMO balance isn't as clear as your comment
suggests.