Re: [v6] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()

From: Jonathan Liu
Date: Sat Mar 24 2018 - 21:23:07 EST


Hi,

On 8 February 2018 at 14:55, Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx> wrote:
> From: AMAN DEEP <aman.deep@xxxxxxxxxxx>
>
> There is a race condition between finish_unlinks->finish_urb() function
> and usb_kill_urb() in ohci controller case. The finish_urb calls
> spin_unlock(&ohci->lock) before usb_hcd_giveback_urb() function call,
> then if during this time, usb_kill_urb is called for another endpoint,
> then new ed will be added to ed_rm_list at beginning for unlink, and
> ed_rm_list will point to newly added.
>
> When finish_urb() is completed in finish_unlinks() and ed->td_list
> becomes empty as in below code (in finish_unlinks() function):
>
> if (list_empty(&ed->td_list)) {
> *last = ed->ed_next;
> ed->ed_next = NULL;
> } else if (ohci->rh_state == OHCI_RH_RUNNING) {
> *last = ed->ed_next;
> ed->ed_next = NULL;
> ed_schedule(ohci, ed);
> }
>
> The *last = ed->ed_next will make ed_rm_list to point to ed->ed_next
> and previously added ed by usb_kill_urb will be left unreferenced by
> ed_rm_list. This causes usb_kill_urb() hang forever waiting for
> finish_unlink to remove added ed from ed_rm_list.
>
> The main reason for hang in this race condtion is addition and removal
> of ed from ed_rm_list in the beginning during usb_kill_urb and later
> last* is modified in finish_unlinks().
>
> As suggested by Alan Stern, the solution for proper handling of
> ohci->ed_rm_list is to remove ed from the ed_rm_list before finishing
> any URBs. Then at the end, we can add ed back to the list if necessary.
>
> This properly handle the updated ohci->ed_rm_list in usb_kill_urb().
>
> Fixes:977dcfdc6031("USB:OHCI:don't lose track of EDs when a controller dies")
> Acked-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
> CC: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Aman Deep <aman.deep@xxxxxxxxxxx>
> Signed-off-by: Jeffy Chen <jeffy.chen@xxxxxxxxxxxxxx>
> ---
>
> Changes in v6:
> This is a resend of Aman Deep's v5 patch [0], which solved the hang we
> hit [1]. (Thanks Aman :)
>
> The v5 has some format issues, so i slightly adjust the commit message.
>
> [0] https://www.spinics.net/lists/linux-usb/msg129010.html
> [1] https://bugs.chromium.org/p/chromium/issues/detail?id=803749
>
> drivers/usb/host/ohci-q.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/usb/host/ohci-q.c b/drivers/usb/host/ohci-q.c
> index b2ec8c399363..4ccb85a67bb3 100644
> --- a/drivers/usb/host/ohci-q.c
> +++ b/drivers/usb/host/ohci-q.c
> @@ -1019,6 +1019,8 @@ static void finish_unlinks(struct ohci_hcd *ohci)
> * have modified this list. normally it's just prepending
> * entries (which we'd ignore), but paranoia won't hurt.
> */
> + *last = ed->ed_next;
> + ed->ed_next = NULL;
> modified = 0;
>
> /* unlink urbs as requested, but rescan the list after
> @@ -1077,21 +1079,22 @@ static void finish_unlinks(struct ohci_hcd *ohci)
> goto rescan_this;
>
> /*
> - * If no TDs are queued, take ED off the ed_rm_list.
> + * If no TDs are queued, ED is now idle.
> * Otherwise, if the HC is running, reschedule.
> - * If not, leave it on the list for further dequeues.
> + * If the HC isn't running, add ED back to the
> + * start of the list for later processing.
> */
> if (list_empty(&ed->td_list)) {
> - *last = ed->ed_next;
> - ed->ed_next = NULL;
> ed->state = ED_IDLE;
> list_del(&ed->in_use_list);
> } else if (ohci->rh_state == OHCI_RH_RUNNING) {
> - *last = ed->ed_next;
> - ed->ed_next = NULL;
> ed_schedule(ohci, ed);
> } else {
> - last = &ed->ed_next;
> + ed->ed_next = ohci->ed_rm_list;
> + ohci->ed_rm_list = ed;
> + /* Don't loop on the same ED */
> + if (last == &ohci->ed_rm_list)
> + last = &ed->ed_next;
> }
>
> if (modified)

I am experiencing a USB function call hang from userspace with OCHI
(full speed USB device) after updating from Linux 4.14.15 to 4.14.24
and noticed this commit.

Here is the Linux 4.14.24 kernel stack trace (extracted from SysRq+w
and amended with addr2line):
[<c05cbd64>] (__schedule) from [<c05cc148>] (schedule+0x50/0xb4)
kernel/sched/core.c:2792
[<c05cc148>] (schedule) from [<c042789c>]
(usb_kill_urb.part.3+0x78/0xa8) include/asm-generic/preempt.h:59
[<c042789c>] (usb_kill_urb.part.3) from [<c0433a3c>]
(usbdev_ioctl+0x1288/0x1cf0) drivers/usb/core/urb.c:690
[<c0433a3c>] (usbdev_ioctl) from [<c0217db8>]
(do_vfs_ioctl+0x9c/0x8ec) drivers/usb/core/devio.c:1835
[<c0217db8>] (do_vfs_ioctl) from [<c021863c>] (SyS_ioctl+0x34/0x5c)
fs/ioctl.c:47
[<c021863c>] (SyS_ioctl) from [<c0107820>] (ret_fast_syscall+0x0/0x54)
include/linux/file.h:39

Afterwards the kernel is unresponsive to disconnect/connect of the
full speed USB device but I can connect/disconnect a high speed USB
device to the same port and communicate to it without problem since it
uses EHCI (OHCI is companion controller). If I try to connect the full
speed USB device again it is still unresponsive. The userspace
application is still hanging after all this.

Could this commit be causing the issue?

Thanks.

Regards,
Jonathan