答复: 答复: [PATCH] usb: xhci: Fix sleep in atomic context in xhci_free_streams()
From: 胡连勤
Date: Thu Jun 11 2026 - 22:14:48 EST
Hi Mathias, Micha
> >>> When a USB device with active stream endpoints is disconnected,
> >>> xhci_free_streams() is called from the hub_event workqueue to
> >>> free the stream resources while holding xhci->lock with irqs
> >>> disabled.
> >>
> >> Pedantry: it (currently) holds the lock while freeing, but it isn't
> >> called with the lock held, nor for the explicit purpose of freeing
> >> things with the lock held. Not sure how to parse this sentence?
> >
> > The above description has a problem; please update the description information:
> > When a USB device with active stream endpoints is disconnected,
> > xhci_free_streams() is called from the hub_event workqueue to
> > free the stream resources. It acquires xhci->lock with irqs
> > disabled, but then calls xhci_free_stream_info() under the lock.
> >
> >>> It calls xhci_free_stream_info(), which invokes
> >>> xhci_free_stream_ctx(), which in turn calls dma_free_coherent()
> >>> for large stream context arrays.
> >>>
> >>> dma_free_coherent() can sleep (e.g. via vunmap), triggering
> >>> a BUG when called from atomic context.
> >>>
> >>> Call trace:
> >>> dma_free_attrs+0x174/0x220
> >>> xhci_free_stream_info+0xd0/0x11c
> >>> xhci_free_streams+0x278/0x37c
> >>> usb_free_streams+0x98/0xc0
> >>> usb_unbind_interface+0x1b8/0x2f8
> >>> device_release_driver_internal+0x1d4/0x2cc
> >>> device_release_driver+0x18/0x28
> >>> bus_remove_device+0x160/0x1a4
> >>> device_del+0x1ec/0x350
> >>> usb_disable_device+0x98/0x214
> >>> usb_disconnect+0xf0/0x35c
> >>> hub_event+0xab4/0x19ec
> >>> process_one_work+0x278/0x63c
> >>>
> >>> Fix this by saving the stream_info pointers and clearing the
> >>> ep references under the lock, then calling xhci_free_stream_info()
> >>> outside the lock where sleeping is allowed.
> >>
> >> I wonder if this copy is necessary or if it would suffice to start with
> >> calling xhci_free_stream_info() unlocked (EP_GETTING_NO_STREAMS should
> >> ensure that nobody submits new URBs and hopefully core won't attempt to
> >> re-enable streams concurrently) and then grab the lock only to update
> >> vdev->eps stream_info and ep_state fields.
> >
> > I considered this approach, but there is
> > a use-after-free window between freeing stream_info and clearing
> > the pointer.
> > The xHCI interrupt handler (xhci_irq()) holds xhci->lock while
> > processing events, and several event handlers access
> > ep->stream_info under the lock — e.g.
> > ring_doorbell_for_active_rings() iterates
> > stream_info->stream_rings[], and xhci_handle_cmd_set_deq()
> > accesses stream_info->stream_ctx_array[].
> > If we free stream_info first (outside the lock), then acquire the
> > lock to clear the pointer, there is a window where the interrupt
> > handler can still see the old (dangling) pointer and dereference
> > freed memory. So we need to clear the reference under the lock first,
> > then free the memory outside.
>
> Good point, but the need for this reveals another bug.
>
> This means xhci_free_streams() can queue a configure endpoint command
> to remove streams, and xhci_irq() can ring the doorbell of a stream at the
> same time xHC controller is processing the command to remove the stream
>
>
> CPU0 CPU1 xHC controller
>
> xhci_free_streams()
> spin_lock()
> state|= EP_GETTING_NO_STREAMS; process old cmd/xfer
> spin_unlock write old event
> xhci_configure_endpoint() trigger interrupt
> xhci_irq()
> spin_lock()
> handling old event
> ring_db_for_active_rings() process configure_endpoint cmd
>
> Looks like we assume streams are functional by checking if ep->stream_info
> exists in many places where we should check ep_state stream flags instead.
>
> We should for example prevent ringing stream doorbell if ep_state has
> EP_GETTING_NO_STREAM flag set.
>
> So I think this is a good targeted patch for this specific issue.
> We should start fixing the other issues and hopefully end up where
> Michal suggested.
Thanks for the review and the insight about the deeper issue.
I agree that many places check ep->stream_info instead of
ep_state stream flags, which could lead to the race you
described.
I'll modify the patch description for this patch and release version v2 later.
Thanks
Lianqin