RE: [RFC/PATCH] usb/xhci: avoid kernel panic on xhci_suspend()

From: Tang, Jianqiang
Date: Tue Jan 07 2014 - 22:49:22 EST


Hi,
1) I met this issue one time just boot up our Linux Platform(Kernel3.10) with XHCI driver, then kernel panic happen.

And this issue reported once by other internal team.

Nothing special of reproduce step and do not need special Hardware I think.

Just random issue which will happen when meet the timing condition.

2) This issue is introduced by this patch:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=596d789a211d134dc5f94d1e5957248c204ef850

which set all hub autosuspend delay to 0.

This causes race condition during XHCI driver initialization,

After USB2 hcd and USB2 root hub finish the initialization, USB2 root hub is functional and auto suspend right now, hence trigger XHCI runtime suspend flow;

At the same time, XHCI driver continue to initialize the USB3 hcd and assign to xhci->shared_hcd after finish the initialization;

Since xhci_suspend() use the xhci->shared_hcd, so there is race condition that when XHCI runtime suspend called, xhci->shared_hcd still NULL.

I think this patch is a fix solution since before XHCI finish the whole initialization, USB2 root hub triggered runtime suspend is mean less and do not need to handle.


Thanks!

-----Original Message-----
From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
Sent: Wednesday, January 08, 2014 9:47 AM
To: David Cohen
Cc: stern@xxxxxxxxxxxxxxxxxxx; sarah.a.sharp@xxxxxxxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Tang, Jianqiang
Subject: Re: [RFC/PATCH] usb/xhci: avoid kernel panic on xhci_suspend()

On Tue, Jan 07, 2014 at 05:44:26PM -0800, David Cohen wrote:
> From: jianqian <jianqiang.tang@xxxxxxxxx>
>
> There is a possible kernel panic faced on xhci_suspend().
> Due to kernel modified the hub autosupend_delay to 0s, after usb1 root
> hub finishes initialization, it will trigger runtime_suspend and then
> it will trigger xhci runtime suspend. But at that time, if
> xhci->shared_hcd is still doing initialization, it is possible to face
> null pointer kernel panic in xhci_suspend() function.
>
> This patch checks if xhci->shared_hcd is null to avoid panic.
>
> Signed-off-by: jianqian <jianqiang.tang@xxxxxxxxx>
> Signed-off-by: David Cohen <david.a.cohen@xxxxxxxxxxxxxxx>
> ---
>
> This is the kernel panic. The bug was discovered on current LTS kernel
> 3.10, as showed on logs. But the problem does not seem to be fixed so far.
> Maybe we should consider apply it on kernel >= 3.10?

How do you trigger this? I've never seen anyone report this problem before, is there something different in the hardware you are using that enables this to be triggered easier?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/