Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout
From: Desnes Nunes
Date: Sat May 02 2026 - 23:37:35 EST
Hello Michal,
On Sat, May 2, 2026 at 6:55 PM Michal Pecio <michal.pecio@xxxxxxxxx> wrote:
>
> On Sat, 2 May 2026 08:38:34 -0300, Desnes Nunes wrote:
> > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > > index e5823650850a..3041deb67b57 100644
> > > --- a/drivers/usb/host/xhci-ring.c
> > > +++ b/drivers/usb/host/xhci-ring.c
> > > @@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct *work)
> > > /* mark this command to be cancelled */
> > > xhci->current_cmd->status = COMP_COMMAND_ABORTED;
> > >
> > > - /* Make sure command ring is running before aborting it */
> > > + /* check for crashed or disconnected chip */
> > > hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> > > - if (hw_ring_state == ~(u64)0) {
> > > + if (hw_ring_state == ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) {
> > > + xhci_info(xhci, "kill the damn thing\n");
> > > xhci_hc_died(xhci);
> > > goto time_out_completed;
> > > }
> > >
> > > + /* Make sure command ring is running before aborting it */
> > > if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> > > (hw_ring_state & CMD_RING_RUNNING)) {
> > > /* Prevent new doorbell, and start command abort */
> >
> > FYI, sorry to be the bearer of bad news, but this also panics the
> > system as soon as I run `echo c > /proc/sysrq-trigger`.
>
> Is this not what's supposed to happen?
>
> Sorry, that complaint is so odd that I thought I'm seeing another case
> of debugging being outsourced to an AI chatbot, which forgot that panic
> is triggered intentionally here. Now I'm just confused.
No, guess you actually saw a case of poor explanation on my end -
apologies for not explaining the outcome properly.
What I tried poorly to explain was that the system simply hanged after
I intentionally triggered the panic with a sysrq - both times.
Nothing happens after the sysrq panic stack trace.
> > Kdump doesn't run and no vmcore is produced:
> Is the kdump kernel not launched, or does it crash during boot?
> The latter would make sense if there is some problem with the code.
Kdump kernel didn't launch at all, thus no vmcore was produced.
> But I don't understand how patching xhci-hcd could possibly have
> any effect on the former. Does this new code execute at all? Does
> "kill the damn thing" ever appear in dmesg?
Both kernels booted normally: the first one checking HSE after USBSTS
was logged on xhci_handle_command_timeout(), as well as this new code
checking for ring state or the HSE and HCE bits.
Since kdump didn't start, the message "kill the damn thing" never got
a chance to appear on crashkernel's dmesg.
Best Regards,
Desnes