Re: [PATCH 2/2] usb: host: Implement workaround for Erratum A-009668

From: Mathias Nyman
Date: Thu Sep 14 2017 - 09:24:17 EST


On 11.09.2017 12:43, yinbo.zhu@xxxxxxx wrote:
From: "yinbo.zhu" <yinbo.zhu@xxxxxxx>

Description: This issue is observed in USB 2.0 mode
when the USB 3.0 host controller is connected to a
FS/LS device via a hub.
The host controller issues start-split (SSPLIT) and
complete-split (CSPLIT) tokens to
accomplish a split-transaction. A split-transaction
consists of a SSPLIT token, token/data
packets, CSPLIT token and token/data/handshake packets.
A SSPLIT token is issued by the host controller to the
hub followed by token/data/handshake packets. The hub
then relays the token/data/handshake packets to the FS
/LS device. Sometime later, the host controller issues
a CSPLIT token followed by the same token/data/handshake
packets to the hub to complete the split-transaction.
As an example scenario, when the xHCI driver issues an
Address device command with BSR=0, the host controller
sends SETUP(SET_ADDRESS) tokens on the USB as part of
splittransactions.
If the host controller receives a NYET response from the
hub for the CSPLIT SETUP token, it means that the
split-transaction has not yet been completed or the hub
is not able to handle the split transaction. In such a
case, the host controller keeps retrying the
splittransactions
until such time an ACK response is received from the hub
for the CSPLIT SETUP token. If the split-transactions do
not complete in a time bound manner, the xHCI driver may
issue a Stop Endpoint Command. The host controller does
not service the Stop Endpoint Command and eventually the
xHCI driver times out waiting for the Stop Endpoint Command
to complete.

Normally we start a command timeout timer each time we start
servicing a new command, this gives each command 5 seconds time
to finish. If it times out we stop the command ring and abort the
command.

The Stop endpoint command has an additinal separate timeout timer
that is started when the stop endpoint command is queued to the ring,
not when host starts to service the command.

I see that we could end up in a situation where one device is being
address (address device command), and a URB is being canceled for
another device almost at the same time (stop endpoint command queued
right after address device command).

If the address device commands times out then the host doesn't have
enough time to service the stop endpoint command
before the stop endpoint timeout timer triggers.

This needs to be fixed, but disabling the entire slot just because
URB is being canceled for a LS/FS device is not the right way to go.

-Mathias