Re: [RFC PATCH] usb: host: xhci: Release spinlock before xhci_handshake in command ring abort
From: Mathias Nyman
Date: Fri Nov 07 2025 - 08:07:07 EST
On 11/6/25 11:57, Uttkarsh Aggarwal wrote:
On 10/22/2025 6:19 PM, Mathias Nyman wrote:
On 10/22/25 13:00, Uttkarsh Aggarwal wrote:
Currently xhci_handshake is a polling loop that waits for change of state.
If this loop is executed while holding a spinlock with IRQs disabled, it
can block interrupts for up to 5 seconds.
To prevent prolonged IRQ disable durations that may lead to watchdog
timeouts, release the spinlock before invoking xhci_handshake() in
xhci_abort_cmd_ring().
The spinlock is reacquired after the handshake to continue with controller
halt and recovery if needed.
Is this a real issue?
It should be extremely rare that commands need to be aborted, and even
less likely that it takes five seconds to stop the command ring.
Can you take logs and traces of this (if it's reproducible).
I suspect there is some other issue that needs to be fixed.
I agree that keeping the spin lock for up to five seconds isn't a good idea, but
just unlocking before the command ring has stopped opens up new race risks.
We need to ensure that queuing a new command mid ring abortion, before ring stopped,
doesn't cause new issues, or that handling command completion events, including the
"stop command ring" event, before polling for a stopped ring works fine.
Hi Mathias,
Yes, actually when we do usb headset connected, no audio playing, resume the device and do some operations like volume +/-, there will be xhci_handle_command_timeout.
logs:
It’s wdog bark, we can see : xhci_handle_command_timeout kicked in since 68346, but not end till 68356 dog bark.
[68346.109071][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz debug: xhci_handle_command_timeout start
[68346.109081][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz debug: xhci_handle_command_timeout xhci_abort_cmd_ring start
[68346.109083][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz debug: xhci_abort_cmd_ring start
[68346.109084][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz debug: xhci_abort_cmd_ring xhci_trb_virt_to_dma
[68346.109087][T25562] [2025:09:04 23:19:02](2)[25562:kworker/2:3]lwz debug: xhci_abort_cmd_ring xhci_handshake_check_state start
….
[68356.215787][ C1] [2025:09:04 23:19:10](1)msm_watchdog 17600000.qcom,wdt: Causing a QCOM Apps Watchdog bite!
[68356.215790][ C1] [2025:09:04 23:19:10](1)[RB/E]rb_sreason_str_set: sreason_str set wdog_bite
Call Trace:
xhci_configure_endpoint hang in below call stack.
Stack:
__switch_to+0x1a8
schedule+0xacc
schedule_timeout+0x4c
wait_for_common+0xc0
wait_for_completion+0x18
xhci_configure_endpoint+0xe4
xhci_check_bandwidth+0x1a8
usb_hcd_alloc_bandwidth+0x37c
usb_disable_device_endpoints+0xb0
usb_disable_device+0x30
usb_disconnect+0xd0
hub_event+0xc4c
process_scheduled_works+0x208
worker_thread+0x1c0
kthread+0xfc
ret_from_fork+0x10
Finally crash on dog bite.
sdei_wdg_bite_cb[qcom_sdei]+0x110
sdei_event_handler+0x3c
do_sdei_event+0x88
__sdei_handler+0x48
__sdei_asm_handler+0x110
__const_udelay+0x148
xhci_handshake+0x58
xhci_handle_command_timeout+0x248
process_scheduled_works+0x208
worker_thread+0x1c0
kthread+0xfc
ret_from_fork+0x10
Based on the logs and crash stacks I have collected so far, including the watchdog bark and the prolonged execution of xhci_handle_command_timeout,
it seems likely that there's an underlying issue causing the ring to hang during xhci_configure_endpoint.
Thanks,
Uttkarsh,
Thanks Uttkarsh
Best guess is that this is somehow related to xHC suspend/resume as xHC and command processing
are stopped during suspend. xhci_suspend() also wipes the command ring clean of any pending
command TRBs without any warning.
A suspended host could also return 0xffffffff on register read, which in turn would cause
xhci_handshake(reg, CMD_RING_RUNNING, 0, 5sec) to run for full five seconds unsuccessfully
waiting for the "command ring running" bit to clear.
Could you add xhci dynamic debug and tracing and reproduce this?
It could show how we end up in this situation.
mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
< Reproduce issue >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace
I'll also write a testpatch with extra debugging related to command ring
Thanks
Mathias