Re: [PATCH] usb: chipidea: udc: reject non-control requests while controller is suspended

From: Alan Stern

Date: Wed Apr 01 2026 - 10:19:58 EST

On Wed, Apr 01, 2026 at 06:45:24AM +0000, Andreea.Popescu@xxxxxxxxxxx wrote:
> On Tue, Mar 31, 2026 at 12:21:45PM +0000, Andreea.Popescu@xxxxxxxxxxx wrote:
> >> When Linux runtime PM autosuspends a ChipIdea UDC that is still
> >> enumerated by the host, the driver gates the PHY clocks and marks
> >> the controller as suspended (ci->in_lpm = 1) but deliberately leaves
> >> gadget.speed unchanged so upper-layer gadget drivers do not see a
> >> spurious disconnect.
> >>
> >> The problem is that those same drivers may continue to call
> >> usb_ep_queue() during the autosuspend window. _hardware_enqueue()
> >> silently adds the request to the endpoint queue and returns 0, but
> >> hw_ep_prime() cannot succeed with gated clocks, so the completion
> >> interrupt never fires. The request — and its backing buffer — is
> >> permanently lost. The caller sees a successful return and never
> >> frees the buffer.
> >Won't the request complete normally after the gadget is resumed, or
> >abnormally after a reset, disconnect, or shutdown? Either way, it
> >wouldn't be lost permanently.
> >
> >Alan Stern
> Thank you very much for the review!
> On "complete normally after resume":
> This would be true only if the runtime-resume path reprimed the pending endpoints. It does not. ci_controller_resume() clears PORTSC_PHCD and ci->in_lpm, restoring the PHY, but it performs no endpoint repriming. The TD that was enqueued during the suspended window has its DMA node linked in hwep->qh.queue and the QH's td.next is written, but the OP_ENDPTPRIME write inside hw_ep_prime() was a no-op against gated clocks. After resume the controller has no knowledge of that TD — the ENDPTPRIME/ENDPTSTAT bits are clean — so it never processes it. The request is not picked up automatically.
> A subsequent request on the same endpoint would be appended to the existing TD chain via the "non-empty queue" branch of _hardware_enqueue(), which does not issue a fresh prime either; it relies on the hardware already being active on that endpoint. Since the first prime was lost, that chain never becomes active.

Okay. Then surely the appropriate way to fix the problem is to change
the runtime-resume path so that it _does_ reprime the pending endpoints.

Alan Stern