Re: [PATCH v3 1/3] usb: ucsi: Add missing ppm_lock

From: Christian A. Ehrhardt
Date: Wed Jan 24 2024 - 07:12:11 EST



Hi Heikki,

Thanks for looking into this.

On Wed, Jan 24, 2024 at 10:09:04AM +0200, Heikki Krogerus wrote:
> On Sun, Jan 21, 2024 at 09:41:21PM +0100, Christian A. Ehrhardt wrote:
> > Calling ->sync_write must be done while holding the PPM lock as
> > the mailbox logic does not support concurrent commands.
> >
> > At least since the addition of partner task this means that
> > ucsi_acknowledge_connector_change should be called with the
> > PPM lock held as it calls ->sync_write.
> >
> > Thus protect the only call to ucsi_acknowledge_connector_change
> > with the PPM. All other calls to ->sync_write already happen
> > under the PPM lock.
> >
> > Fixes: b9aa02ca39a4 ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Christian A. Ehrhardt <lk@xxxxxxx>
> > ---
> > drivers/usb/typec/ucsi/ucsi.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> > index 61b64558f96c..8f9dff993b3d 100644
> > --- a/drivers/usb/typec/ucsi/ucsi.c
> > +++ b/drivers/usb/typec/ucsi/ucsi.c
> > @@ -935,7 +935,9 @@ static void ucsi_handle_connector_change(struct work_struct *work)
> >
> > clear_bit(EVENT_PENDING, &con->ucsi->flags);
> >
> > + mutex_lock(&ucsi->ppm_lock);
> > ret = ucsi_acknowledge_connector_change(ucsi);
> > + mutex_unlock(&ucsi->ppm_lock);
> > if (ret)
> > dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);
>
> Is this really actually fixing some issue? The function has already
> taken the connector lock, so what exactly could happen without this?

I've definitely _seen_ issues with the quirk from PATCH 3/3 if the
lock is missing. I'm pretty sure from looking at the code that races
with other connectors can happen without the quirk, too.

The PPM lock protects the Mailbox and the ACK_PENDING/COMMAND_PENDING
bits and I could observe cases where ucsi_acpi_sync_write() was entered
with the COMMAND_PENDING bit already set. One possible sequence is this:

ucsi_connector_change() for connector #1
=> schedules partner tasks
=> Acks the connector change
=> Releases locks
ucsi_connecotr_change() for connector #2
=> acquire con lock for #2
=> Start to process connector state change
=> Processing reaches the point right before
ucsi_acknowledge_connector_change()
Connector #1 workqueue starts to process one of the partner tasks
=> Acquire con lock for connector #1
=> Acquire ppm lock
=> Enter ucsi_exec_command()
=> ->sync_write() starts to use the mailbox and sets
COMMAND_PENDING
=> ->sync_write blocks waiting for the command completion
with wait_for_completion_timeout().
ucsi_connector_change() for connector #2 continues
=> execute ucsi_acknowledge_connector_change() and start using
the mailbox that is still in use.
=> BOOM

There is a simpler an much more likely sequence with the dell quirk active.

regards Christian