Re: [PATCH] Prevent race condition between USB authorisation and USB discovery events

From: Andrew Worsley
Date: Mon Dec 17 2018 - 19:35:07 EST


On Tue, 18 Dec 2018 at 03:21, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, 17 Dec 2018, Andrew Worsley wrote:
>
> > A sysfs driven USB authorisation change can trigger a usb_set_configuration
> > while a hub_event worker thread is running. This can result in a USB device
> > being disabled just after it was configured and bringing down all the
> > devices and impacting hardware and user processes that were established on
> > top of this these interfaces. In some cases the USB disable never completed
> > and the whole system hung.
>
> Can you be more specific about this? Disabling a USB device shouldn't
> cause these kinds of problems, regardless of whether or not the device
> was just configured.

I can't say too much about the hardware - but there was an SPI bus and
an i2c bus built
on top of USB devices on a bus. It appears the SPI bus shutdown was
the item that was prone
to problems when being torn down.

I understand it would be better if the dependant systems shutdown
cleanly, which it
did about 75% of the time but then it was rather messy sequence. So
the current system
is far from ideal. The crux of the issue is the usb_disable_device()
(line 1874 of drivers/usb/core/message.c)
call in usb_set_configuration() is applied to a configured device and
triggers the
tear down of all the devices built on that USB device and is very disruptive.

> > At my work I had an occasional hang due to this race condition. Roughly 1
> > in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang.
> > This patch fixed the problem and I had no problems (spurious disables
> > or hangs) in 750+ boots.
>
> usb_authorize_device, usb_deauthorize_device, and hub_event all acquire
> the device mutex. Why should adding another mutex make any difference?

I believe the new usb_authorize_mutex prevents the cascade of USB
device bus scans,
discoveries and probes from colliding with those caused a change in
USB bus authorisation.
The individual device / hub locks are not across the whole system,
only an individual
component hub/device so any logic that gives an orderly USB device
scanning and probing is
undermined. The idea of the mutex is to keep the USB device discovery
work when a device is
added/removed/powered up is separated from those that are caused by
authorisation changes.

I don't pretend to understand all the logic and sequencing of the
current code which works works
faultlessly when these threads are serialised by the mutex (750+
times) in boot ups which
build up the system and shutdowns which bring down the system.
Likewise if I endlessly run
the authorisation off / on by itself it is faultless over hundreds of
iterations.

> In fact there's an actual disadvantage: Making hub_event acquire a
> global mutex will prevent us from handling multiple hubs concurrently.
> Although we don't do this now, we might want to in the future.
>
> Alan Stern

This is true - I am not trying to serialise the hub_event() threads -
but to prevent the authorisation changes
from happening during the hub_event threads. So perhaps there would be
a way of allowing multiple
concurrent hub_events from happening but only one authorisation thread
at a time. Assuming multiple
hub_event()s are deemed ok. Something like a lock which allows
multiple readers but only a single exclusive
writer lock?

Andrew