Re: [syzbot] INFO: rcu detected stall in tx

From: Mathias Nyman
Date: Mon May 24 2021 - 11:24:00 EST


On 20.5.2021 23.30, Thinh Nguyen wrote:
> +Mathias
>
...

> Hm... looks like we have a couple of issues in the uas storage class
> driver and the xhci driver.
>
> We may need to fix that in the uas storage driver because it doesn't
> seem to handle it. (check uas_data_cmplt() in uas.c).
>
> As for the xhci driver, there maybe a case where the stream URB never
> gets to complete because the transaction err_count is not properly
> updated. The err_count for transaction error is stored in ep_ring, but
> the xhci driver may not be able to lookup the correct ep_ring based on
> TRB address for streams. There are cases for streams where the event
> TRBs have their TRB pointer field cleared to '0' (xhci spec section
> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> it automatically does a soft-retry. This is seen from one of our
> testings that the driver was repeatedly doing soft-retry until the class
> driver timed out.
>
> Hi Mathias, maybe you have some comment on this? Thanks.

This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
We should add one and prevent a loop. after e few soft resets we can end with a
hard reset to clear the host side endpoint halt.

We don't know the URB that was being tansferred during the error, and can't
give it back with a proper error code.
In that sense we still end up waiting for a timeout and someone to cancel
the urb.

-Mathias