Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout

From: Desnes Nunes

Date: Sun May 03 2026 - 12:21:11 EST


Hi Michal,

On Sun, May 3, 2026 at 2:18 AM Michal Pecio <michal.pecio@xxxxxxxxx> wrote:
> Well, that's weird. But it seems you have serial console enabled so
> I guess you should know whether it fails to start or crashes.

Yes, I have been checking all boots and crashes through the serial console.

> It could show on the main kernel before the panic is triggered, if the
> main kernel was patched too. Maybe they are the same kernel binary?

Yes, same patched binary on the main kernel and kdump kernel.

> I'm trying to come up with any conceivable theory how patching xhci-hcd
> could prevent the kdump kernel from loading. Still no idea...

Just found the reason: with the installation of this last kernel, my
/boot partition got filled. Thus, the initframs image was not actually
getting copied to /boot.

After removing a few test kernels, kdump armed normally, collected a
vmcore and no hangs due to the locks of xhci_alloc_dev() or
device_shutdown() appeared.

So, I confirm that this patch, which checks for HSE or HCE indeed
fixes the bug, without having to rely to a
wait_for_completion_timeout():

# grep -i HSE -A5 kexec-dmesg.log
[Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Command timeout,
USBSTS: 0x00000015 HCHalted HSE PCD
[Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: kill the damn thing
[Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: xHCI host controller
not responding, assume dead
[Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: HC died; cleaning up
[Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Error while
assigning device slot ID: Command Aborted
[Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Max number of
devices this xHCI host supports is 64.

Best Regards,

Desnes