Re: [PATCH v4 1/3] usb: host: Implement workaround for Erratum A-007463

From: Mathias Nyman
Date: Thu Jan 04 2018 - 08:35:53 EST


On 19.12.2017 12:16, yinbo.zhu@xxxxxxx wrote:
From: yinbo.zhu <yinbo.zhu@xxxxxxx>

When a transaction error (defined in Section 4.10.2.3, "USB
Transaction Error" of the xHCI Specification) occurs on the
USB, the host controller reports this through a transfer
event with the completion code "USB Transaction Error". When
this happens, the endpoint is placed in the Halted state. In
response, software must issue a Reset Endpoint command to
transition the endpoint to the Stopped state. In order to
restart the transfer, the driver can perform either of the
following:
a) Ring the doorbell again, which restarts the transfer from
where it stopped, or
b) Issue a Set TR (Transfer Ring) Dequeue Pointer command
for the endpoint to start the transfer form a different
Transfer Ring pointer Consider the following
scenario:
1. The xHCI driver prepares a control transfer read to one
of the device's control endpoints;
2. During the IN data stage, a transaction error occurs on
the USB, causing a transfer event with the completion
code "USB Transaction Error";
3. The driver issues a Reset Endpoint command;
4. The driver rings the doorbell of the control endpoint to
resume the transfer. In this scenario, the controller
may reverse the direction of the data stage from IN to OUT.

The xhci driver should not ring the doorbell without first issuing a
Set TR Dequeue Pointer command in the USB Transaction error case.

code flow:

handle_tx_event()
case COMP_USB_TRANSACTION_ERROR:
status = -EPROTO;
process_ctrl_td()
switch (trb_comp_code)
default:
if (!xhci_requires_manual_halt_cleanup() // FALSE
break;
/* else fall through */
case COMP_STALL_ERROR:
finish_td()
if (xhci_requires_manual_halt_cleanup()) // TRUE
xhci_cleanup_halted_endpoint(...,EP_HARD_RESET)
xhci_queue_reset_ep()
xhci_cleanup_stalled_ring()
xhci_find_new_dequeue_state()
xhci_queue_new_dequeue_state()

If you can see the driver ringing the doorbell before the
Set TR Dequeue Pointer command then there is some race going
on in the driver

Instead of sending an ACK to the endpoint to poll for read
data, it sends a Data Packet (DP) to the endpoint. It
fetches the data from the data stage Transfer Request
Block (TRB) that is being resumed, even though the data
buffer is setup to receive data and not transmit it.
NOTE
This issue occurs only if the transaction error happens
during an IN data stage. There is no issue if the transaction
error happens during an OUT data stage.

Have you been able to trigger this issue?


Impact: When this issue occurs, the device likely responds in
one of the following ways:
a) The device responds with a STALL because the data stage
has unexpectedly changed directions. The controller then
generates a Stall Error transfer event, to which software
must issue a Reset Endpoint command followed by a Set TR
Dequeue Pointer command pointing to a new Setup TRB to clear
the STALL condition.
b) The device does not respond to the inverted data stage and
the transaction times out. The controller generates another
USB Transaction Error transfer event, to which software
likely performs a USB Reset to the device because it is
unresponsive. It is not expected that any of these recovery
steps will cause instability in the system because this
recovery is part of a standard xHCI driver and could happen
regardless of the defect. Another possible system-level
impact is that the controller attempts to read from the
memory location pointed at by the Data Stage TRB or a Normal
TRB chained to it. associated with this TRB is intended to be
written by the controller, but the controller reads from it
instead. Normally, this does not cause a problem. However, if
the system has some type of memory protection where this
unexpected read is treated as a bus error,
a problem. However, if the system has some type of memory
it may cause the system to become unstable or to crash.

Workaround: If a USB Transaction Error occurs during the IN
data phase of a control transfer, the driver must use the
Set TR Dequeue Pointer command to either restart the data
Phase or restart the entire control transfer from the
Setup phase.

The Set TR Dequeue pointer command should already be the default way
the xhci driver handles this now.


Configs Affected:
LS1021-20-22A-R1.0, LS1021-20-22A-R2.007463

Signed-off-by: yinbo zhu <yinbo.zhu@xxxxxxx>
---


@@ -1951,14 +1953,30 @@ static int finish_td(struct xhci_hcd *xhci, struct xhci_td *td,
if (trb_comp_code == COMP_STALL_ERROR ||
xhci_requires_manual_halt_cleanup(xhci, ep_ctx,
trb_comp_code)) {
- /* Issue a reset endpoint command to clear the host side
- * halt, followed by a set dequeue command to move the
- * dequeue pointer past the TD.
- * The class driver clears the device side halt later.
+ /*erratum A-007463:
+ *After transaction error, controller switches control transfer
+ *data stage from IN to OUT direction.
*/
- xhci_cleanup_halted_endpoint(xhci, slot_id, ep_index,
+ remaining = EVENT_TRB_LEN(le32_to_cpu(event->transfer_len));
+ if (remaining && xhci_requires_manual_halt_cleanup(xhci, ep_ctx,
+ trb_comp_code) &&
+ (xhci->quirks & XHCI_REVERSE_IN_OUT)) {
+ memset(&deq_state, 0, sizeof(deq_state));
+ xhci_find_new_dequeue_state(xhci, slot_id,
+ ep_index, td->urb->stream_id, td, &deq_state);
+ xhci_queue_new_dequeue_state(xhci, slot_id, ep_index,
+ &deq_state);
+ xhci_ring_cmd_db(xhci);

Even if all these changes should be unnecessary the endpoint is still halted
here, and we would need a reset endpoint command here as well, right?

Has this new codepath been tested?

+ } else {
+ /* Issue a reset endpoint command to clear the host side
+ * halt, followed by a set dequeue command to move the
+ * dequeue pointer past the TD.
+ * The class driver clears the device side halt later.
+ */
+ xhci_cleanup_halted_endpoint(xhci, slot_id, ep_index,
ep_ring->stream_id, td, ep_trb,
EP_HARD_RESET);

-Mathias