Re: [PATCH] usb: xhci-ring: set all cancelled_td's cancel_status to TD_CLEARING_CACHE

From: Mathias Nyman
Date: Fri Aug 13 2021 - 05:07:36 EST


On 13.8.2021 11.44, wat@xxxxxxxxxxxxxx wrote:
> On 2021-08-13 15:25, Ikjoon Jang wrote:
>> Hi,
>>
>> On Fri, Aug 13, 2021 at 10:44 AM Tao Wang <wat@xxxxxxxxxxxxxx> wrote:
>>>
>>> USB SSD may fail to unmount if disconnect during data transferring.
>>>
>>> it stuck in usb_kill_urb() due to urb use_count will not become zero,
>>> this means urb giveback is not happen.
>>> in xhci_handle_cmd_set_deq() will giveback urb if td's cancel_status
>>> equal to TD_CLEARING_CACHE,
>>> but in xhci_invalidate_cancelled_tds(), only last canceled td's
>>> cancel_status change to TD_CLEARING_CACHE,
>>> thus giveback only happen to last urb.
>>>
>>> this change set all cancelled_td's cancel_status to TD_CLEARING_CACHE
>>> rather than the last one, so all urb can giveback.
>>>
>>> Signed-off-by: Tao Wang <wat@xxxxxxxxxxxxxx>
>>> ---
>>>  drivers/usb/host/xhci-ring.c | 24 ++++++++++++------------
>>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>> index 8fea44b..c7dd7c0 100644
>>> --- a/drivers/usb/host/xhci-ring.c
>>> +++ b/drivers/usb/host/xhci-ring.c
>>> @@ -960,19 +960,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
>>>                         td_to_noop(xhci, ring, td, false);
>>>                         td->cancel_status = TD_CLEARED;
>>>                 }
>>> -       }
>>> -       if (cached_td) {
>>> -               cached_td->cancel_status = TD_CLEARING_CACHE;
>>> -
>>> -               err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
>>> -                                               cached_td->urb->stream_id,
>>> -                                               cached_td);
>>> -               /* Failed to move past cached td, try just setting it noop */
>>> -               if (err) {
>>> -                       td_to_noop(xhci, ring, cached_td, false);
>>> -                       cached_td->cancel_status = TD_CLEARED;
>>> +               if (cached_td) {
>>> +                       cached_td->cancel_status = TD_CLEARING_CACHE;
>>> +
>>> +                       err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
>>> +                                                       cached_td->urb->stream_id,
>>> +                                                       cached_td);
>>> +                       /* Failed to move past cached td, try just setting it noop */
>>> +                       if (err) {
>>> +                               td_to_noop(xhci, ring, cached_td, false);
>>> +                               cached_td->cancel_status = TD_CLEARED;
>>> +                       }
>>> +                       cached_td = NULL;
>>>                 }
>>> -               cached_td = NULL;
>>
>> I think we can call xhci_move_dequeue_past_td() just once to
>> the last halted && cancelled TD in a ring.
>>
>> But that might need to compare two TDs to see which one is
>> the latter, I'm not sure how to do this well. :-/
>>
>> if (!cached_td || cached_td < td)
>>   cached_td = td;
>>
>
> thanks, I think you are correct that we can call xhci_move_dequeue_past_td() just once to
>  the last halted && cancelled TD in a ring,
> but the set status "cached_td->cancel_status = TD_CLEARING_CACHE;" should be every cancelled TD.
> I am not very good at td and ring, I have a question why we need to
> compare two TDs to see which one is the latter.

I'm debugging the exact same issue.
For normal endpoints (no streams) it should be enough to set cancel_td->cancel_status = TD_CLEARING_CACHE
in the TD_DIRTY and TD_HALTED case.

We don't need to move the dq past the last cancelled TD as other cancelled TDs are set to no-op, and
the command to move the dq will flush the xHC controllers TD cache and read the no-ops.
(just make sure we call xhci_move_dequeue_past_td() _after_ overwriting cancelled TDs with no-op)

Streams get trickier as each endpoint has several rings, and we might need to move the dq pointer for
many stream rings on that endpoint. This needs more work as we shouldn't start the endpoint before all
the all move dq commands complete. i.e. the current ep->ep_state &= ~SET_DEQ_PENDING isn't enough.

-Mathias