Re: [PATCH] usb: chipidea: udc: reject non-control requests while controller is suspended

From: Andreea.Popescu@xxxxxxxxxxx

Date: Thu Apr 02 2026 - 09:25:50 EST

On Wed, Apr 01, 2026 at 10:47:54AM +0000, Andreea.Popescu@xxxxxxxxxxx wrote:
>> On Tue, Mar 31, 2026 at 12:21:45PM +0000, Andreea.Popescu@xxxxxxxxxxx wrote:
>> >> When Linux runtime PM autosuspends a ChipIdea UDC that is still
>> >> enumerated by the host, the driver gates the PHY clocks and marks
>> >> the controller as suspended (ci->in_lpm = 1) but deliberately leaves
>> >> gadget.speed unchanged so upper-layer gadget drivers do not see a
>> >> spurious disconnect.
>> >
>> >It's strange that chipidea UDC will runtime suspend even it's already
>> >enumerated by the host. AFAIK, the udc driver will call pm_runtime_get_sync()
>> >in ci_hdrc_gadget_connect(is_active = true), so it will be in runtime active
>> >state all the time unless a explicit pm_runtime_put/_autosuspend() is called
>> >in somewhere.
>> >
>> >Would you share more details how device controller go to runtime suspended?
>> >Thanks,
>> >Xu Yang
>> Thank you very much for taking the time and pointing this. It made me realize a very important distinction. I am using an I.MX board, due to this I will split my answer, so you can decide if it's >still worth what I am proposing or you can just reject it. Either way, I am most grateful.
>
>Thank you for the information.
>
>> Still applicable to 6.19 mainline:
>> ep_queue returning 0 for USB_SPEED_UNKNOWN: I believe there might be the following window: _gadget_stop_activity() sets gadget.speed = USB_SPEED_UNKNOWN, but ep_queue is called before that >completes from a concurrent context. The return 0 is misleading and should be -ESHUTDOWN.
>
>I think this will hardly happen or the window is very small. Because when
>the driver sees the speed is USB_SPEED_UNKNOWN, it has most likely already
>seen that ep->enabled is false. _gadget_stop_activity() will call usb_ep_disable(),
>so usb_ep_queue() will return -ESHUTDOWN early.
>
>> I.MX specific: On i.MX SoCs the chipidea controller sits inside a power domain managed by imx-blk-ctrl or the GPC. When that parent domain is shut down by the platform PM framework, >pm_runtime_force_suspend() is called on the chipidea device, bypassing usage_count entirely and invoking ci_runtime_suspend → ci_controller_suspend → ci->in_lpm = true. This happens while VBUS is >still present and the gadget is enumerated. This is the actual path I observed and it is platform-specific, not a general chipidea mainline issue. Due to this, please disregard the proposed change >with _ep_queue guard on ci->in_lpm
>
>Then I guess the PM framework is working abnormally. How could a parent
>domain shut down itself when its active subdomain and users are using it?
Thank you very much for all the reviews! I would like to withdraw the patch proposal.
I will describe the issue that I have and why the proposed solution seems to work on my scenario, however I fully agree I might be looking at the wrong place,taking the wrong approach.
Topology: PC (windows) (host) <-usb 1-0-> (gadget) I.MX board (linux) (host) <-usb 1-1-> Board2 (linux) (I am really sorry for the generic description, but I can not allowed to disclose more details).
Board 2 generates logs which are routed to I.MX to be further stored on PC.
Without patch, logs on IMX:
hub 1-0:1.0: state 7 ports 1 chg 0000 evt 0002
ci_hdrc ci_hdrc.0: GetStatus port:1 status 10001803 8 ACK POWER sig=j CSC CONNECT
usb usb1-port1: status 0101, change 0001, 12 Mb/s
usb usb1-port1: debounce total 100ms stable 100ms status 0x101
ci_hdrc ci_hdrc.0: port 1 reset complete, port enabled
ci_hdrc ci_hdrc.0: GetStatus port:1 status 18001205 12 ACK POWER sig=se0 LPM PE CONNECT
usb 1-1: new high-speed USB device number 3 using ci_hdrc
ci_hdrc ci_hdrc.0: port 1 reset complete, port enabled
ci_hdrc ci_hdrc.0: GetStatus port:1 status 18001205 12 ACK POWER sig=se0 LPM PE CONNECT
usb 1-1: skipped 3 descriptors after interface
usb 1-1: default language 0x0409
usb 1-1: udev 3, busnum 1, minor = 2
usb 1-1: usb_probe_device
usb 1-1: configuration #1 chosen from 1 choice
usb 1-1: adding 1-1:1.0 (config #1, interface 0)
usb_bridge 1-1:1.0: usb_probe_interface
usb_bridge 1-1:1.0: usb_probe_interface - got id
diag_bridge 1-1:1.0: usb_probe_interface
diag_bridge 1-1:1.0: usb_probe_interface - got id
Logs flow, but in time they start stalling, they accumulate on IMX side until the pool is exhausted, on PC logs are received slow until they stop completly. Just around this timeframe in the middle of a diag log being received on IMX to be send to the PC I see:
(this is the imx usb connected to the pc)
ci_hdrc ci_hdrc.1: at ci_runtime_suspend
imx_usb 5b0e0000.usb: at imx_controller_suspend
After this point there is no recovery until reboot or until I physically disconnect the usb from the pc. Problem is that board2 due to not receiving any completition has at this point the usb endpoints stuck (at least for diag interface), no new events possible,so things only get worse.

With patch:
Same logs until problem happens. When previously they would have stalled, now I start seeing
configfs-gadget gadget: usb_diag_write: cannot queue read request
diag: In diag_usb_write, error writing to usb channel diag_mdm, err: -5.
This rejection is a valid response for board2 and eventhough logs are lost, there is no stuck condition anymore on usb endpoints.
Problem is easily seen when windows starts reseting the USB and afterwards eventhough no reset happens anymore, it gets stuck in this situation.
I thank you for your time and most usefull information

________________________________________
De la: Xu Yang <xu.yang_2@xxxxxxx>
Trimis: joi, 2 aprilie 2026 13:16
Către: Popescu, Andreea
Cc: Peter Chen; Greg Kroah-Hartman; linux-usb@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subiect: Re: [PATCH] usb: chipidea: udc: reject non-control requests while controller is suspended

On Wed, Apr 01, 2026 at 10:47:54AM +0000, Andreea.Popescu@xxxxxxxxxxx wrote:
> On Tue, Mar 31, 2026 at 12:21:45PM +0000, Andreea.Popescu@xxxxxxxxxxx wrote:
> >> When Linux runtime PM autosuspends a ChipIdea UDC that is still
> >> enumerated by the host, the driver gates the PHY clocks and marks
> >> the controller as suspended (ci->in_lpm = 1) but deliberately leaves
> >> gadget.speed unchanged so upper-layer gadget drivers do not see a
> >> spurious disconnect.
> >
> >It's strange that chipidea UDC will runtime suspend even it's already
> >enumerated by the host. AFAIK, the udc driver will call pm_runtime_get_sync()
> >in ci_hdrc_gadget_connect(is_active = true), so it will be in runtime active
> >state all the time unless a explicit pm_runtime_put/_autosuspend() is called
> >in somewhere.
> >
> >Would you share more details how device controller go to runtime suspended?
> >Thanks,
> >Xu Yang
> Thank you very much for taking the time and pointing this. It made me realize a very important distinction. I am using an I.MX board, due to this I will split my answer, so you can decide if it's still worth what I am proposing or you can just reject it. Either way, I am most grateful.

Thank you for the information.

> Still applicable to 6.19 mainline:
> ep_queue returning 0 for USB_SPEED_UNKNOWN: I believe there might be the following window: _gadget_stop_activity() sets gadget.speed = USB_SPEED_UNKNOWN, but ep_queue is called before that completes from a concurrent context. The return 0 is misleading and should be -ESHUTDOWN.

I think this will hardly happen or the window is very small. Because when
the driver sees the speed is USB_SPEED_UNKNOWN, it has most likely already
seen that ep->enabled is false. _gadget_stop_activity() will call usb_ep_disable(),
so usb_ep_queue() will return -ESHUTDOWN early.

> I.MX specific: On i.MX SoCs the chipidea controller sits inside a power domain managed by imx-blk-ctrl or the GPC. When that parent domain is shut down by the platform PM framework, pm_runtime_force_suspend() is called on the chipidea device, bypassing usage_count entirely and invoking ci_runtime_suspend → ci_controller_suspend → ci->in_lpm = true. This happens while VBUS is still present and the gadget is enumerated. This is the actual path I observed and it is platform-specific, not a general chipidea mainline issue. Due to this, please disregard the proposed change with _ep_queue guard on ci->in_lpm

Then I guess the PM framework is working abnormally. How could a parent
domain shut down itself when its active subdomain and users are using it?

Thanks,
Xu Yang