Re: [v6] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()

From: Jonathan Liu
Date: Sat Mar 24 2018 - 22:03:49 EST


Hi,

On 25 March 2018 at 12:21, Jonathan Liu <net147@xxxxxxxxx> wrote:
> Hi,
>
> On 8 February 2018 at 14:55, Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx> wrote:
>> From: AMAN DEEP <aman.deep@xxxxxxxxxxx>
>>
>> There is a race condition between finish_unlinks->finish_urb() function
>> and usb_kill_urb() in ohci controller case. The finish_urb calls
>> spin_unlock(&ohci->lock) before usb_hcd_giveback_urb() function call,
>> then if during this time, usb_kill_urb is called for another endpoint,
>> then new ed will be added to ed_rm_list at beginning for unlink, and
>> ed_rm_list will point to newly added.
>>
>> When finish_urb() is completed in finish_unlinks() and ed->td_list
>> becomes empty as in below code (in finish_unlinks() function):
>>
>> if (list_empty(&ed->td_list)) {
>> *last = ed->ed_next;
>> ed->ed_next = NULL;
>> } else if (ohci->rh_state == OHCI_RH_RUNNING) {
>> *last = ed->ed_next;
>> ed->ed_next = NULL;
>> ed_schedule(ohci, ed);
>> }
>>
>> The *last = ed->ed_next will make ed_rm_list to point to ed->ed_next
>> and previously added ed by usb_kill_urb will be left unreferenced by
>> ed_rm_list. This causes usb_kill_urb() hang forever waiting for
>> finish_unlink to remove added ed from ed_rm_list.
>>
>> The main reason for hang in this race condtion is addition and removal
>> of ed from ed_rm_list in the beginning during usb_kill_urb and later
>> last* is modified in finish_unlinks().
>>
>> As suggested by Alan Stern, the solution for proper handling of
>> ohci->ed_rm_list is to remove ed from the ed_rm_list before finishing
>> any URBs. Then at the end, we can add ed back to the list if necessary.
>>
>> This properly handle the updated ohci->ed_rm_list in usb_kill_urb().
>>
>> Fixes:977dcfdc6031("USB:OHCI:don't lose track of EDs when a controller dies")
>> Acked-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
>> CC: <stable@xxxxxxxxxxxxxxx>
>> Signed-off-by: Aman Deep <aman.deep@xxxxxxxxxxx>
>> Signed-off-by: Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx>
>> ---
>>
>> Changes in v6:
>> This is a resend of Aman Deep's v5 patch [0], which solved the hang we
>> hit [1]. (Thanks Aman :)
>>
>> The v5 has some format issues, so i slightly adjust the commit message.
>>
>> [0] https://www.spinics.net/lists/linux-usb/msg129010.html
>> [1] https://bugs.chromium.org/p/chromium/issues/detail?id=803749
>>
>> drivers/usb/host/ohci-q.c | 17 ++++++++++-------
>> 1 file changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/usb/host/ohci-q.c b/drivers/usb/host/ohci-q.c
>> index b2ec8c399363..4ccb85a67bb3 100644
>> --- a/drivers/usb/host/ohci-q.c
>> +++ b/drivers/usb/host/ohci-q.c
>> @@ -1019,6 +1019,8 @@ static void finish_unlinks(struct ohci_hcd *ohci)
>> * have modified this list. normally it's just prepending
>> * entries (which we'd ignore), but paranoia won't hurt.
>> */
>> + *last = ed->ed_next;
>> + ed->ed_next = NULL;
>> modified = 0;
>>
>> /* unlink urbs as requested, but rescan the list after
>> @@ -1077,21 +1079,22 @@ static void finish_unlinks(struct ohci_hcd *ohci)
>> goto rescan_this;
>>
>> /*
>> - * If no TDs are queued, take ED off the ed_rm_list.
>> + * If no TDs are queued, ED is now idle.
>> * Otherwise, if the HC is running, reschedule.
>> - * If not, leave it on the list for further dequeues.
>> + * If the HC isn't running, add ED back to the
>> + * start of the list for later processing.
>> */
>> if (list_empty(&ed->td_list)) {
>> - *last = ed->ed_next;
>> - ed->ed_next = NULL;
>> ed->state = ED_IDLE;
>> list_del(&ed->in_use_list);
>> } else if (ohci->rh_state == OHCI_RH_RUNNING) {
>> - *last = ed->ed_next;
>> - ed->ed_next = NULL;
>> ed_schedule(ohci, ed);
>> } else {
>> - last = &ed->ed_next;
>> + ed->ed_next = ohci->ed_rm_list;
>> + ohci->ed_rm_list = ed;
>> + /* Don't loop on the same ED */
>> + if (last == &ohci->ed_rm_list)
>> + last = &ed->ed_next;
>> }
>>
>> if (modified)
>
> I am experiencing a USB function call hang from userspace with OCHI
> (full speed USB device) after updating from Linux 4.14.15 to 4.14.24
> and noticed this commit.
>
> Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w
> and amended with addr2line):
> [<c05cbd64>] (__schedule) from [<c05cc148>] (schedule+0x50/0xb4)
> kernel/sched/core.c:2792
> [<c05cc148>] (schedule) from [<c042789c>]
> (usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59
> [<c042789c>] (usb_kill_urb.part.3) from [<c0433a3c>]
> (usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690
> [<c0433a3c>] (usbdev_ioctl) from [<c0217db8>]
> (do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835
> [<c0217db8>] (do_vfs_ioctl) from [<c021863c>] (SyS_ioctl+0x34/0x5c)
> fs/ioctl.c:47
> [<c021863c>] (SyS_ioctl) from [<c0107820>] (ret_fast_syscall+0x0/0x54)
> include/linux/file.h:39
>
> Afterwards the kernel is unresponsive to disconnect/connect of the
> full speed USB device but I can connect/disconnect a high speed USB
> device to the same port and communicate to it without problem since it
> uses EHCI (OHCI is companion controller). If I try to connect the full
> speed USB device again it is still unresponsive. The userspace
> application is still hanging after all this.
>
> Could this commit be causing the issue?

Note that I have been running with OHCI_QUIRK_QEMU added to
ochi->flags in ohci_init of drivers/usb/host/ohci-hcd.c for both
4.14.15 and 4.14.24 kernels due to a race condition that was later
addressed by https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.14.23&id=85c3d26bd754160f6be90d8b078d70a1b35820e7,
so io_watchdog_func of drivers/usb/host/ohci-hcd.c is not being run.

Thanks.

Regards,
Jonathan