Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before traversing lists

From: Felipe Balbi
Date: Thu Jul 29 2021 - 05:33:00 EST



Hi,

Wesley Cheng <wcheng@xxxxxxxxxxxxxx> writes:
>>> The list_for_each_entry_safe() macro saves the current item (n) and
>>> the item after (n+1), so that n can be safely removed without
>>> corrupting the list. However, when traversing the list and removing
>>> items using gadget giveback, the DWC3 lock is briefly released,
>>> allowing other routines to execute. There is a situation where, while
>>> items are being removed from the cancelled_list using
>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>>> routine is running in parallel (due to UDC unbind). As the cleanup
>>> routine removes n, and the pullup disable removes n+1, once the
>>> cleanup retakes the DWC3 lock, it references a request who was already
>>> removed/handled. With list debug enabled, this leads to a panic.
>>> Ensure all instances of the macro are replaced where gadget giveback
>>> is used.
>>>
>>> Example call stack:
>>>
>>> Thread#1:
>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>> -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>> ->list_for_each_entry_safe()
>>> ->dwc3_gadget_giveback(n)
>>> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>> ->spin_unlock
>>> ->Thread#2 executes
>>> ...
>>> ->dwc3_gadget_giveback(n+1)
>>> ->Already removed!
>>>
>>> Thread#2:
>>> dwc3_gadget_pullup()
>>> ->waiting for dwc3 spin_lock
>>> ...
>>> ->Thread#1 released lock
>>> ->dwc3_stop_active_transfers()
>>> ->dwc3_remove_requests()
>>> ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>> ->dwc3_gadget_giveback()
>>> ->dwc3_gadget_del_and_unmap_request()- n+1
>>> deleted[cancelled_list]
>>> ->spin_unlock
>>>
>>> Fix this condition by utilizing list_replace_init(), and traversing
>>> through a local copy of the current elements in the endpoint lists.
>>> This will also set the parent list as empty, so if another thread is
>>> also looping through the list, it will be empty on the next iteration.
>>>
>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>>> Signed-off-by: Wesley Cheng <wcheng@xxxxxxxxxxxxxx>
>>>
>>> ---
>>> Previous patchset:
>>> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@xxxxxxxxxxxxxx/
>>> ---
>>> drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index a29a4ca..3ce6ed9 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>> {
>>> struct dwc3_request *req;
>>> struct dwc3_request *tmp;
>>> + struct list_head local;
>>> struct dwc3 *dwc = dep->dwc;
>>>
>>> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>>> +restart:
>>> + list_replace_init(&dep->cancelled_list, &local);
>>
>> hmm, if the lock is held and IRQs disabled when this runs, then no other
>> threads will be able to append requests to the list which makes the
>> "restart" label unnecessary, no?
>
> We do still call dwc3_gadget_giveback() which would release the lock
> briefly, so if there was another thread waiting on dwc->lock, it would
> be able to add additional items to that list.
>
>>
>> I wonder if we should release the lock and reenable interrupts after
>> replacing the head. The problem is that
>> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ
>> handler.
>>
>
> We would also need to consider that some of the APIs being called in
> these situations would also have the assumption that the dwc->lock is
> held, ie dwc3_gadget_giveback()

yeah, good point. I think we're good to integrate this, unless Alan can
shed some light on some particular possible race scenario we may have
missed.

In any case:

Acked-by: Felipe Balbi <balbi@xxxxxxxxxx>

--
balbi