Re: [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock

From: Desnes Nunes

Date: Thu Apr 30 2026 - 13:29:19 EST

Greetings Michal,

For starters, thank you very much for the thorough review.

On Thu, Apr 30, 2026 at 5:49 AM Michal Pecio <michal.pecio@xxxxxxxxx> wrote:
>
> On Wed, 29 Apr 2026 22:48:17 -0300, Desnes Nunes wrote:
> > The following deadlock in the usb subsystem can be triggered during kdump:
> >
> > systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 is taking a long time
> > dracut-initqueue[432]: Timed out while waiting for udev queue to empty.
> > systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 killed
> > systemd-udevd[402]: usb3: Worker [419] terminated by signal 9 (KILL).
> > ...
> > kdump[720]: saving vmcore complete
> > ...
> > systemd-shutdown[1]: Rebooting.
> > INFO: task kworker/0:6:76 blocked for more than 122 seconds.
>
> That's suspiciously long indeed.
>
> > Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
>
> Pretty old kernel, and distribution to boot.
> Have you tried 7.x, does the bug still exist?

Yes, v7.0 reproduces the same crash when xhci tries to access slot_id
at xhci_alloc_dev():
...
[ 72.987601] systemd-udevd[246]: usb3: Worker [255] processing
SEQNUM=2193 is taking a long time
[ 132.237566] dracut-initqueue[277]: Timed out while waiting for udev
queue to empty.
[ 202.988014] systemd-udevd[246]: usb3: Worker [255] processing
SEQNUM=2193 killed
[ 202.998059] systemd-udevd[246]: usb3: Worker [255] terminated by
signal 9 (KILL).
...
[ 206.288378] kdump[569]: saving vmcore complete
...
[ 206.821258] systemd-shutdown[1]: Rebooting.
[ 246.858495] INFO: task kworker/0:1:11 blocked for more than 122 seconds.
[ 246.865319] Not tainted 7.0.0-clean #1
[ 246.869663] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 246.877623] task:kworker/0:1 state:D stack:0 pid:11
tgid:11 ppid:2 task_flags:0x4208160 flags:0x00080000
[ 246.888942] Workqueue: usb_hub_wq hub_event
[ 246.893202] Call Trace:
[ 246.895690] <TASK>
[ 246.897828] __schedule+0x299/0x5c0
[ 246.901378] schedule+0x27/0x80
[ 246.904572] schedule_timeout+0xbd/0x100
[ 246.908565] __wait_for_common+0x97/0x1b0
[ 246.912644] ? __pfx_schedule_timeout+0x10/0x10
[ 246.917252] xhci_alloc_dev+0x9e/0x2b0
[ 246.921068] usb_alloc_dev+0x7a/0x3b0
[ 246.924795] hub_port_connect+0x285/0x960
[ 246.928873] hub_port_connect_change+0x94/0x290
[ 246.933482] port_event+0x4bb/0x840
[ 246.937030] hub_event+0x141/0x460
[ 246.940489] process_one_work+0x196/0x390
[ 246.944569] worker_thread+0x1af/0x320
[ 246.948383] ? __pfx_worker_thread+0x10/0x10
[ 246.952724] kthread+0xe3/0x120
[ 246.955921] ? __pfx_kthread+0x10/0x10
[ 246.959736] ret_from_fork+0x199/0x260
[ 246.963550] ? __pfx_kthread+0x10/0x10
[ 246.967362] ret_from_fork_asm+0x1a/0x30
[ 246.971355] </TASK>
[ 369.738508] INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
[ 369.745593] Not tainted 7.0.0-clean #1
[ 369.749935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 369.757897] task:systemd-shutdow state:D stack:0 pid:1
tgid:1 ppid:0 task_flags:0x400100 flags:0x00080000
[ 369.769128] Call Trace:
[ 369.771616] <TASK>
[ 369.773752] __schedule+0x299/0x5c0
[ 369.777299] schedule+0x27/0x80
[ 369.780493] schedule_preempt_disabled+0x15/0x30
[ 369.785188] __mutex_lock.constprop.0+0x547/0xac0
[ 369.789974] device_shutdown+0xac/0x1b0
[ 369.793877] kernel_restart+0x3a/0x70
[ 369.797603] __do_sys_reboot+0x147/0x240
[ 369.801595] do_syscall_64+0x11b/0x6a0
[ 369.805407] ? handle_mm_fault+0x110/0x350
[ 369.809574] ? do_user_addr_fault+0x206/0x680
[ 369.814006] ? irqentry_exit+0x7a/0x4d0
[ 369.817907] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 369.823046] RIP: 0033:0x7fe2958da917
[ 369.826684] RSP: 002b:00007ffc5c458618 EFLAGS: 00000206 ORIG_RAX:
00000000000000a9
[ 369.834383] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe2958da917
[ 369.841639] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 369.848893] RBP: 00007ffc5c458790 R08: 0000000000000069 R09: 00000000ffffffff
[ 369.856148] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[ 369.863402] R13: 0000000000000000 R14: 00007ffc5c4588b8 R15: 0000000000000000
[ 369.870659] </TASK>
[ 369.872888] INFO: task systemd-shutdow:1 is blocked on a mutex
likely owned by task kworker/0:1:11.

# gdb vmlinux
...
Reading symbols from vmlinux...
(gdb) list *(xhci_alloc_dev+0x9e)
0xffffffff81d10fee is in xhci_alloc_dev (drivers/usb/host/xhci.c:4236).
4231 }
4232 xhci_ring_cmd_db(xhci);
4233 spin_unlock_irqrestore(&xhci->lock, flags);
4234
4235 wait_for_completion(command->completion);
4236 slot_id = command->slot_id;
4237
4238 if (!slot_id || command->status != COMP_SUCCESS) {
4239 xhci_err(xhci, "Error while assigning device slot ID: %s\n",
4240 xhci_trb_comp_code_string(command->status));

> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:kworker/0:6 state:D stack:0 pid:76 tgid:76 ppid:2 task_flags:0x4208060 flags:0x00004000
> > Workqueue: usb_hub_wq hub_event
> > Call Trace:
> > <TASK>
> > __schedule+0x2a5/0x630
> > schedule+0x27/0x80
> > schedule_timeout+0xbf/0x100
> > __wait_for_common+0x95/0x1b0
> > ? __pfx_schedule_timeout+0x10/0x10
> > xhci_alloc_dev+0x9e/0x290
> > usb_alloc_dev+0x77/0x3a0
> > hub_port_connect+0x293/0x9a0
> > hub_port_connect_change+0x94/0x260
> > port_event+0x4d1/0x7f0
> > hub_event+0x16f/0x480
> > process_one_work+0x174/0x330
> > worker_thread+0x256/0x3a0
> > ? __pfx_worker_thread+0x10/0x10
> > kthread+0xfa/0x240
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork+0x31/0x50
> > ? __pfx_kthread+0x10/0x10
> > ret_from_fork_asm+0x1a/0x30
> > </TASK>
> > INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
> > Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00000002
> > Call Trace:
> > <TASK>
> > __schedule+0x2a5/0x630
> > schedule+0x27/0x80
> > schedule_preempt_disabled+0x15/0x30
> > __mutex_lock.constprop.0+0x497/0x860
> > device_shutdown+0xac/0x190
> > kernel_restart+0x3a/0x70
> > __do_sys_reboot+0x146/0x240
> > do_syscall_64+0x7d/0x160
> > ? devkmsg_write.cold+0x24/0x4a
> > ? update_load_avg+0x7f/0x730
> > ? __dequeue_entity+0x3ec/0x4a0
> > ? update_load_avg+0x7f/0x730
> > ? pick_next_task_fair+0x1e6/0x330
> > ? finish_task_switch.isra.0+0x97/0x2a0
> > ? rseq_get_rseq_cs+0x1d/0x220
> > ? rseq_ip_fixup+0x8d/0x1d0
> > ? arch_exit_to_user_mode_prepare.isra.0+0xa5/0xd0
> > ? syscall_exit_to_user_mode+0x32/0x190
> > ? do_syscall_64+0x89/0x160
> > ? handle_mm_fault+0x110/0x370
> > ? do_user_addr_fault+0x606/0x830
> > ? exc_page_fault+0x7f/0x150
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7f32517d9917
> > RSP: 002b:00007ffc018d4fb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f32517d9917
> > RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
> > RBP: 00007ffc018d5130 R08: 0000000000000069 R09: 00000000ffffffff
> > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > R13: 0000000000000000 R14: 00007ffc018d5258 R15: 0000000000000000
> > </TASK>
> >
> > During crashkernel's boot, hub_event() takes usb_lock_device(hdev) of the
> > root hub and keeps it for the whole hub processing loop, since it calls
> > hub_port_connect() -> usb_alloc_dev() -> xhci_alloc_dev(). If during kdump
> > another device (e.g., a mis-initialized dGPU) hogs interrupts or DMAs, the
> > TRB_ENABLE_SLOT command will be blocked from completion in time, moving
> > the HC to an unstable condition (e.g., HSE in USBSTS).
>
> What specifically have you seen?

Right after xe and drm set the dGPU (Battlemage G31) during the boot
of the crashkernel, I see the `usb worker taking a long time`
messages.
Afterwards, kdump is able to capture a vmcore completely, the systems
tries to reboot and hangs with the xhci_alloc_dev() and
device_shutdown() stack traces I provided.

> If you have actually observed HSE (how?), maybe xhci-hcd could detect
> it automatically by the same means and clean up immediately.

That's exactly what I was expecting xhci-hcd to do when I first
started looking at this bug, however it does not detect it
automatically.

As for how I saw HSE, while testing the patch before submission, since
I already had the xhci lock, I just added a read of the usbsts before
calling xhci_hc_died(xhci):

...
- wait_for_completion(command->completion);
- slot_id = command->slot_id;
+ if (!wait_for_completion_timeout(command->completion,
+ msecs_to_jiffies(2 *
command->timeout_ms))) {
+ spin_lock_irqsave(&xhci->lock, tflags);
+ usbsts = readl(&xhci->op_regs->status);
+ xhci_err(xhci,
+ "TRB_ENABLE_SLOT: no command completion after %lums, USBSTS:%s\n",
+ 2 * command->timeout_ms,
+ xhci_decode_usbsts(ststr, usbsts));
+ xhci_hc_died(xhci);
+ spin_unlock_irqrestore(&xhci->lock, tflags);
+ }
...

This debug version of the patch printed:

[ 17.481330] xhci_hcd 0000:80:14.0: TRB_ENABLE_SLOT: no command
completion after 10000ms, USBSTS: 0x00000015 HCHalted HSE PCD

In the end, I didn't add the read of usbsts in the version I submitted
because not only xhci_hc_died(xhci) prints errors messages, but now
since slot_id is NULL, the next error messages will be printed; before
xhci_alloc_dev() safely returns 0 and the system reboots smootly.
I can add it to a v2 if it actually turns out to be necessary.

> > After vmcore gets captured, init calls device_shutdown() trying to
> > shut down the hub device, by also trying to take the same lock still
> > held by the hub kworker task.
> >
> > Avoid the deadlock by adding a 2x timeout for command completion
>
> nit: not a deadlock if X waits for Y and Y is just stuck by itself.

Noted!

> > before calling xhci_hc_died(). This gives enough time before marking
> > the host un- stable, dying and calling xhci_cleanup_command_queue();
> > which unblocks the hub worker into releasing the lock, allowing
> > device_shutdown() to proceed.
>
> Many functions which wait for command completion without timeouts.
> Patch this one and you will get stuck in the next.
> This shouldn't be happening in the first place. If a command doesn't
> complete normally in time then xhci_handle_command_timeout() should
> abort it, and if that times out too, then hc_died().

To come up with this timeout logic, I actually "borrowed" the idea
from the original commit described on Fixes c311e391a7efd ("xhci:
rework command timeout and cancellation,")
In the past, before the wait_for_completion rearrangement done in that
commit, this part of xhci_alloc_dev() code used to have calls to
wait_for_completion_interruptible_timeout() waiting for
XHCI_CMD_DEFAULT_TIMEOUT.
Furthermore, just to give another chance I added a bit more time by
putting `2 * command->timeout_ms` in the patch. However, I have tested
with XHCI_CMD_DEFAULT_TIMEOUT=5000ms and this also works.
If XHCI_CMD_DEFAULT_TIMEOUT it thought to be better, if this patch
gets approved, I can send a v2 with XHCI_CMD_DEFAULT_TIMEOUT instead
of 2x that.

> So not sure why this hasn't happened here.
> Is it reproducible? Can you try again with debug logs?

Always reproduceable - just need to issue `echo c >
/proc/sysrq-trigger` on the system.

> echo 'module xhci_hcd +p' >/proc/dynamic_debug/control

Actually, from the beginning of all my debugging I already had
`usbcore.dyndbg=+p xhci_hcd.dyndbg=+p xhci_pci.dyndbg=+p` on the
kernel cmdline, as well as on the crashkernel's
KDUMP_COMMANDLINE_APPEND at /etc/sysconfig/kdump.

On crashkernel's kexec-dmesg of the unpatched kernel I see multiple
doorbell rings stating the HSE:

...
[Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Command timeout,
USBSTS: 0x00000015 HCHalted HSE PCD
[Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Command timeout on
stopped ring
[Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Turn aborted command
000000005921b827 to no-op
[Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: // Ding dong!
...

Whereas, on the patched kernel's kexec-dmesg.log only two rings happen.

FYI:
# lspci -k -vvv -s 0000:80:14.0
80:14.0 USB controller: Intel Corporation 800 Series PCH USB 3.1 xHCI
HC (rev 10) (prog-if 30 [XHCI])
Subsystem: Intel Corporation Device 7270
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 154
IOMMU group: 20
Region 0: Memory at 8000200000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [70] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [80] MSI: Enable+ Count=8/8 Maskable- 64bit+
Address: 00000000fee00718 Data: 0000
Capabilities: [90] Vendor Specific Information: Len=14 <?>
Capabilities: [b0] Vendor Specific Information: Len=00 <?>
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci

> Regards,
> Michal

Hope to have answered all questions and concerns.
Please let me know what further action I should take for further investigation.

Best Regards,