Re: 答复: [PATCH] usb: xhci: Fix sleep in atomic context in xhci_free_streams()
From: Mathias Nyman
Date: Thu Jun 11 2026 - 13:54:09 EST
On 6/11/26 11:33, 胡连勤 wrote:
Hi Michal Pecio:
When a USB device with active stream endpoints is disconnected,
xhci_free_streams() is called from the hub_event workqueue to
free the stream resources while holding xhci->lock with irqs
disabled.
Pedantry: it (currently) holds the lock while freeing, but it isn't
called with the lock held, nor for the explicit purpose of freeing
things with the lock held. Not sure how to parse this sentence?
The above description has a problem; please update the description information:
When a USB device with active stream endpoints is disconnected,
xhci_free_streams() is called from the hub_event workqueue to
free the stream resources. It acquires xhci->lock with irqs
disabled, but then calls xhci_free_stream_info() under the lock.
It calls xhci_free_stream_info(), which invokes
xhci_free_stream_ctx(), which in turn calls dma_free_coherent()
for large stream context arrays.
dma_free_coherent() can sleep (e.g. via vunmap), triggering
a BUG when called from atomic context.
Call trace:
dma_free_attrs+0x174/0x220
xhci_free_stream_info+0xd0/0x11c
xhci_free_streams+0x278/0x37c
usb_free_streams+0x98/0xc0
usb_unbind_interface+0x1b8/0x2f8
device_release_driver_internal+0x1d4/0x2cc
device_release_driver+0x18/0x28
bus_remove_device+0x160/0x1a4
device_del+0x1ec/0x350
usb_disable_device+0x98/0x214
usb_disconnect+0xf0/0x35c
hub_event+0xab4/0x19ec
process_one_work+0x278/0x63c
Fix this by saving the stream_info pointers and clearing the
ep references under the lock, then calling xhci_free_stream_info()
outside the lock where sleeping is allowed.
I wonder if this copy is necessary or if it would suffice to start with
calling xhci_free_stream_info() unlocked (EP_GETTING_NO_STREAMS should
ensure that nobody submits new URBs and hopefully core won't attempt to
re-enable streams concurrently) and then grab the lock only to update
vdev->eps stream_info and ep_state fields.
I considered this approach, but there is
a use-after-free window between freeing stream_info and clearing
the pointer.
The xHCI interrupt handler (xhci_irq()) holds xhci->lock while
processing events, and several event handlers access
ep->stream_info under the lock — e.g.
ring_doorbell_for_active_rings() iterates
stream_info->stream_rings[], and xhci_handle_cmd_set_deq()
accesses stream_info->stream_ctx_array[].
If we free stream_info first (outside the lock), then acquire the
lock to clear the pointer, there is a window where the interrupt
handler can still see the old (dangling) pointer and dereference
freed memory. So we need to clear the reference under the lock first,
then free the memory outside.
Good point, but the need for this reveals another bug.
This means xhci_free_streams() can queue a configure endpoint command
to remove streams, and xhci_irq() can ring the doorbell of a stream at the
same time xHC controller is processing the command to remove the stream
CPU0 CPU1 xHC controller
xhci_free_streams()
spin_lock()
state|= EP_GETTING_NO_STREAMS; process old cmd/xfer
spin_unlock write old event
xhci_configure_endpoint() trigger interrupt
xhci_irq()
spin_lock()
handling old event
ring_db_for_active_rings() process configure_endpoint cmd
Looks like we assume streams are functional by checking if ep->stream_info
exists in many places where we should check ep_state stream flags instead.
We should for example prevent ringing stream doorbell if ep_state has
EP_GETTING_NO_STREAM flag set.
So I think this is a good targeted patch for this specific issue.
We should start fixing the other issues and hopefully end up where
Michal suggested.
Thanks
Mathias