Re: [syzbot] INFO: rcu detected stall in tx

From: Alan Stern
Date: Mon May 24 2021 - 14:55:25 EST


On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
> On 20.5.2021 23.30, Thinh Nguyen wrote:
> > As for the xhci driver, there maybe a case where the stream URB never
> > gets to complete because the transaction err_count is not properly
> > updated. The err_count for transaction error is stored in ep_ring, but
> > the xhci driver may not be able to lookup the correct ep_ring based on
> > TRB address for streams. There are cases for streams where the event
> > TRBs have their TRB pointer field cleared to '0' (xhci spec section
> > 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> > it automatically does a soft-retry. This is seen from one of our
> > testings that the driver was repeatedly doing soft-retry until the class
> > driver timed out.
> >
> > Hi Mathias, maybe you have some comment on this? Thanks.
>
> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
> We should add one and prevent a loop. after e few soft resets we can end with a
> hard reset to clear the host side endpoint halt.
>
> We don't know the URB that was being tansferred during the error, and can't
> give it back with a proper error code.
> In that sense we still end up waiting for a timeout and someone to cancel
> the urb.

That's not good. There may not be a timeout; drivers expect transfers
to complete with a failure, not to be retried indefinitely.

However, if you do know which endpoint/stream the error is connected to,
you should be able to get the URB. It will be the first one queued for
that endpoint/stream.

Alan Stern