Re: Missing USB XHCI and EHCI reset for kexec

From: Thadeu Lima de Souza Cascardo
Date: Tue Apr 15 2014 - 14:34:10 EST


On Tue, Apr 15, 2014 at 05:00:28PM +0200, stefani@xxxxxxxxxxx wrote:
>
> Zitat von Thadeu Lima de Souza Cascardo <cascardo@xxxxxxxxxxxxxxxxxx>:
>
> >On Tue, Apr 15, 2014 at 12:04:17PM +0200, stefani@xxxxxxxxxxx wrote:
> >>
> >>Zitat von Thadeu Lima de Souza Cascardo <cascardo@xxxxxxxxxxxxxxxxxx>:
> >>
> >>>On Mon, Apr 14, 2014 at 05:44:58PM +0200, stefani@xxxxxxxxxxx wrote:
> >>>>
> >>>>Zitat von Benjamin Herrenschmidt <benh@xxxxxxxxxxx>:
> >>>>
> >>>>>I don't know about EHCI specifically but this is a known issue with
> >>>>>XHCI, I observe similar issues on other powerpc platforms (servers)
> >>>>>and this isn't architecture specific (looks more like actualy xhc
> >>>>>implementation specific).
> >>>>>
> >>>>>Thadeu Cascardo (on CC) has been the one investigating that on our side,
> >>>>>he might have more to add including patches.
> >>>>>
> >>>>
> >>>>I have now a kernel 3.14 dmesg log of the problem. After a kexec the
> >>>>kexeced 3.14 kernel shows:
> >>>>
> >>>>[ 1.170029] xhci_hcd 0001:03:00.0: xHCI Host Controller
> >>>>[ 1.175306] xhci_hcd 0001:03:00.0: new USB bus registered,
> >>>>assigned bus number 1
> >>>>[ 1.212561] xhci_hcd 0001:03:00.0: Host not halted after 16000
> >>>>microseconds.
> >>>>[ 1.219621] xhci_hcd 0001:03:00.0: can't setup: -110
> >>>>[ 1.224597] xhci_hcd 0001:03:00.0: USB bus 1 deregistered
> >>>>[ 1.230021] xhci_hcd 0001:03:00.0: init 0001:03:00.0 fail, -110
> >>>>[ 1.235955] xhci_hcd: probe of 0001:03:00.0 failed with error -110
> >>>>
> >>>
> >>>What is your controller vendor and device IDs? Is that a TI chip?
> >>>
> >>
> >>Yes it is a TI chip, vendor ID 104c and product ID 8241.
> >>
> >>>Can you check if the patch I sent a month ago fixes it? [1] There's the
> >>>whole story there. In fact, you will also need something like the patch
> >>>below. Can you apply only the first one, verify, and, then, the other
> >>>one as well, and report what worked for you?
> >>>
> >>>[1] http://marc.info/?l=linux-usb&m=139483181809062&w=2
> >>>
> >>
> >>I tried the attach patch and it did not help. This is what i
> >>expected because this is a fix in the shutdown path, which will
> >>never called when doing a forced kexec.
> >
> >Hi, Stefani.
> >
> >Did you try with both patches applied? How do you evoke the forced
> >kexec? Is that a kexec on panic? Does it really need to be forced? With
> >no clean shutdown, platform and drivers would need to issue resets, like
> >you mentioned below, to get the system into a clean state.
> >
>
> Yes, i applied both patches. But without success.
>
> IMHO i think it is necessary to bring the device i a clean state
> when the driver use the HW.
>
> >>
> >>I have a running a 3.10.23 kernel. This kernel do a kexec for a
> >>kernel 3.14. Since the kernel 3.10.23 did not performe a clean
> >>shutdown, the state of the XHCI Controller is undefined. So when
> >
> >And the clean shutdown requires both of my patches, for TI chips, as far
> >as I know. It looks like the problem is issuing a halt when there are
> >pending URBs.
> >
> >>kernel 3.14 will probe XHCI it will find a XHCI controller which was
> >>not performed a reset.
> >>
> >
> >The problem is not that a reset hasn't been issued. A PCI function reset
> >should fix most of the problems with a bad device state, when the reset
> >works. However, the problem is that it was not cleanly shut down. URBs
> >should have been canceled and removed from the controller queue, and it
> >should have halted after that.
>
> Again, i think it is the job of the driver to bring the chip in a clean state
> before using them. A driver should never expect a reset state of a chip.
>
> >
> >>So i think it is necessary to reset the XHCI controller and all
> >>devices on this bus. This is what i do with a "echo 1
> >>>/sys/bus/pci/drivers/xhci_hcd/0001:03:00.0/reset" before the kexec.
> >>
> >
> >One way to look at that is making the PCI code issue resets to all buses
> >before doing any other access. That will make booting more slow, and
> >there are a lot of other corner cases where this might not be enough.
> >It's probably more sane to try to get the 3.10.23 kernel to do a clean
> >shutdown, if possible.
> >
>
> With this driver design the kexec functionality is usesless on PowerPC.
> X86 looks a little bit better.
>
> - Stefani
>
>

What is the vendor and device ID you are using on your X86 system? This
is not a matter of what architecture you are using, it's the XHCI
controller which does not behave as well as the one you are using on
X86, which is likely an Intel one.

Cascardo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/