Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 1 comp_code 1

From: Michał Pecio
Date: Wed Apr 10 2024 - 05:46:21 EST


> Driver can cope with these extra events, but if this is common we
> should probably handle it silently and not concern users with that
> ERROR message.

The error message in itself is harmless, it means the driver gets an
event it doesn't know how to handle and ignores it. Further events are
processed normally (in this specific case).

What's problematic is that the controller is apparently still working
on a TD which the driver considers to be finished already. The driver
can overwrite the TD and reuse its data buffer for other transfers,
while the hardware may still need the original TD for proper operation
and, if we are very unlucky, could attempt DMA to/from the data buffer,
causing data corruption or information leak to a malicious USB dongle.

For all we know, Paul's buggy chipset may not only be confirming the
transfer twice, but really performing it twice for some stupid reason.


> We are actually at the moment looking at improving handle_tx_event()
> with Niklas (cc), and will take this into consideration.

Given the number of bugs so far, maybe it would make sense to count
transfer ring slots of the last completed TD as still "in use" until
the next TD is known to at least begin executing.

Unfortunately, "quarantining" URB data buffers in similar manner would
be harder AFAIK.

I recently found one more bug of this kind: the Etron EJ168 controller
produces two events for failed single-TRB isochronous IN transfers -
one event indicating the failure, and then a "success". The full extent
of the bug (does it affect OUT or non-isoch, what happens on multi-TRB)
is unknown because the controller is very prone to crashing under my
workloads, which doesn't help debugging.

Regards,
Michal