Re: [PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH

From: Eugene Shatokhin
Date: Fri Aug 28 2015 - 04:10:18 EST

25.08.2015 00:01, BjÃrn Mork ÐÐÑÐÑ:
Eugene Shatokhin <eugene.shatokhin@xxxxxxxxxx> writes:

The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.

Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.

The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().

Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:

*0 __skb_unlink (skbuff.h:1517)
prev->next = next;
*1 defer_bh (usbnet.c:430)
spin_lock_irqsave(&list->lock, flags);
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
state = defer_bh(dev, skb, &dev->rxq, state);

At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:

*0 usbnet_terminate_urbs (usbnet.c:765)
/* maybe wait for deletions to finish. */
while (!skb_queue_empty(&dev->rxq)
&& !skb_queue_empty(&dev->txq)
&& !skb_queue_empty(&dev->done)) {
netif_dbg(dev, ifdown, dev->net,
"waited for %d urb completions\n", temp);
*1 usbnet_stop (usbnet.c:806)
if (!(info->flags & FLAG_AVOID_UNLINK_URBS))

As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.

Exactly what problem will that result in? The tasklet_kill() will wait
for the processing of the single element done queue, and everything will
be fine. Or?

Given enough time, what prevents defer_bh() from calling tasklet_schedule(&dev->bh) *after* usbnet_stop() calls tasklet_kill()?

Consider the following situation (assuming '&&' are changed to '||' in that while loop in usbnet_terminate_urbs() as they should be):

usbnet_stop() defer_bh() with list == dev->rxq
__skb_unlink() removes the last
skb from dev->rxq.
dev->rxq, dev->txq and dev->done
are now empty.
while (!skb_queue_empty()...)
The loop ends because all 3
queues are now empty.

usbnet_terminate_urbs() ends.

usbnet_stop() continues:
del_timer_sync (&dev->delay);
tasklet_kill (&dev->bh);
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)

The BH is scheduled at this point, which is not what was intended. The race window is small, but still.


