Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling

From: Mathias Nyman

Date: Fri Feb 27 2026 - 04:43:57 EST

On 2/26/26 20:17, Thinh Nguyen wrote:

On Thu, Feb 26, 2026, Mathias Nyman wrote:

On 2/26/26 11:27, Dayu Jiang wrote:

Hi Greg,

I have updated the changelog text as requested and resubmitted the patch.
https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20260128100746.561626-1-jiangdayu@xxxxxxxxxx/__;!!A4F2R9G_pg!ZSJNDKyOinm26qngopLW-axiQtwDAMely4bDqtqYDGv1ErWCtS6kZ6ZamdiKoZKuCyCk0IxMQK5g625GEIxYWFzKpAEiCUq7$
Please kindly review it and let me know if it is acceptable now.

I'll send it forward, but changed the commit message.
Does this modified version still describe the case accurately:

usb: xhci: Prevent interrupt storm on host controller error (HCE)

The xHCI controller reports a Host Controller Error (HCE) in UAS Storage
Device plug/unplug scenarios on Android devices, which is checked in
xhci_irq() function and causes an interrupt storm (since the interrupt
isn’t cleared), leading to severe system-level faults.

When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop. The interrupt storm
does however continue until driver manually disables xHC interrupt and
stops the controller by calling xhci_halt().

The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.

This only fixes the interrupt storm. Proper HCE recovery requires resetting
and re-initializing the xHC.

The controller is halted if there's an error like HCE. It's odd to try
to "halt" it again. Not sure how this will impact for other controllers.

This is why I changed the commit message from:

"When the xHCI controller reports HCE in the interrupt handler, the driver
currently only logs a warning and continues execution. However, HCE
indicates a critical hardware failure that requires the controller to be
halted. This ensures the controller is in a consistent state and prevents
further operations on failed hardware."

to:

"When the xHC controller reports HCE in the interrupt handler, the driver
only logs a warning and assumes xHC activity will stop. The interrupt storm
does however continue until driver manually disables xHC interrupt and
stops the controller by calling xhci_halt()."

I can clarify it further by stating that .."assumes xHC activity will stop
as stated in xHCI spec. On some xHC controllers an interrupt storm continues after
HCE error, and only ceases after manually"..

The host is messed up at this point, and we are not recovering it. I don't think
there is any harm in a manual halt at this stage.

Even if we don't have the full HCE recovery implemented, did we try to
just do HCRST, which is the first step of the recovery?

Specs state that HCRST might re-trigger the HCE if it's due to a "hard" fault,
and driver needs to take action to prevent a HCE - HCRST recovery loop.

HCRST will clear all registers, so we need to reinitialize everything here,
write back addresses of event rings, command rings, DCBAA, scratchpads
dequeue pointers etc.

I support taking this fix to prevent the interrupt storm, an issue seen in real
life. And then solve proper recovery later.

Niklas is actually working on decoupling memory allocation and xHC register
initialization which will help future HCE recovery work.

Thanks
Mathias