Re: [PATCH v3] usb: typec: ucsi: Fix busy loop on ASUS VivoBooks

From: Heikki Krogerus
Date: Wed Sep 04 2024 - 10:57:09 EST


On Wed, Sep 04, 2024 at 03:58:05PM +0200, Christian A. Ehrhardt wrote:
>
> Hi Heikki,
>
> On Wed, Sep 04, 2024 at 03:07:45PM +0300, Heikki Krogerus wrote:
> > On Tue, Sep 03, 2024 at 08:19:17PM +0200, Christian A. Ehrhardt wrote:
> > > If the busy indicator is set, all other fields in CCI should be
> > > clear according to the spec. However, some UCSI implementations do
> > > not follow this rule and report bogus data in CCI along with the
> > > busy indicator. Ignore the contents of CCI if the busy indicator is
> > > set.
> > >
> > > If a command timeout is hit it is possible that the EVENT_PENDING
> > > bit is cleared while connector work is still scheduled which can
> > > cause the EVENT_PENDING bit to go out of sync with scheduled connector
> > > work. Check and set the EVENT_PENDING bit on entry to
> > > ucsi_handle_connector_change() to fix this.
> > >
> > > Reported-by: Anurag Bijea <icaliberdev@xxxxxxxxx>
> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219108
> > > Bisected-by: Christian Heusel <christian@xxxxxxxxx>
> > > Tested-by: Anurag Bijea <icaliberdev@xxxxxxxxx>
> > > Fixes: de52aca4d9d5 ("usb: typec: ucsi: Never send a lone connector change ack")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: Christian A. Ehrhardt <lk@xxxxxxx>
> > > ---
> > > drivers/usb/typec/ucsi/ucsi.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> > > index 4039851551c1..540cb1d2822c 100644
> > > --- a/drivers/usb/typec/ucsi/ucsi.c
> > > +++ b/drivers/usb/typec/ucsi/ucsi.c
> > > @@ -38,6 +38,10 @@
> > >
> > > void ucsi_notify_common(struct ucsi *ucsi, u32 cci)
> > > {
> > > + /* Ignore bogus data in CCI if busy indicator is set. */
> > > + if (cci & UCSI_CCI_BUSY)
> > > + return;
> >
> > I started testing this and it looks like the commands never get
> > cancelled when the BUSY bit is set. I don't think this patch is the
> > problem, though. I think the BUSY handling broke earlier, probable in
> > 5e9c1662a89b ("usb: typec: ucsi: rework command execution functions").
> >
> > I need to look at this a bit more carefully, but in the meantime, can
> > you try this:
> >
> > if (cci & UCSI_CCI_BUSY) {
> > complete(&ucsi->complete);
> > return;
> > }
>
> I really don't think this is the correct thing to do and it will
> likely make things worse.

That was the behaviour before all that command execution refactoring
this summer. I'm not saying that it's right, but that's how it was.

> A notification with the UCSI_CCI_BUSY bit does _not_ mean that
> the controller is busy doing other things and cannot complete the
> command.
>
> Instead it is an indication that the controller _is_ working to
> complete our command but will take somewhat longer:
>
> Citing:
> | Note: If a command takes longer than MIN_TIME_TO_RESPOND_WITH_BUSY ms
> | for the PPM (excluding PPM to OPM communication latency) to complete,
> | then the PPM shall respond to the command by setting the CCI Busy
> | Indicator and notify the OPM.
> | Subsequently, when the PPM actually completes the command, the
> | PPM shall notify the OPM of the outcome of the command via an
> | asynchronous notification associated with that command.
>
> Unless I misunderstand what you are trying to do your change would
> cause us to needlessly abort/cancel every command that takes more than
> MIN_TIME_TO_RESPOND_WITH_BUSY to complete.
>
> What am I missing?

The decision to Cancel was made to work around buggy EC firmwares that
reported BUSY, and then never completed the command. So without that
Cancel hack, the PPM was stuck on those systems.

I don't know what we should do about that hack. We probable could just
ignore those old systems, and then add quirks for them as needed. But
I also don't really like what you are proposing in this patch, that we
basically ignore the BUSY bit completely.

Right now I was hoping that we return the behaviour of the driver to
a point where everything worked as before, and after that start
improving the driver. That's why I was hoping to hear does the problem
that you are seeing go away with that approach.

With which command do you guys get the busy notification?

In any case, I don't think all those ucsi_*_common() functions give us
enough room to move here. I feel that the command execution needs to
be refactored somehow again.

thanks,

--
heikki