Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command
From: Baolin Wang
Date: Thu Dec 01 2016 - 21:47:00 EST
On 1 December 2016 at 21:28, Mathias Nyman
<mathias.nyman@xxxxxxxxxxxxxxx> wrote:
> On 01.12.2016 06:54, Baolin Wang wrote:
>>
>> On 30 November 2016 at 22:09, Mathias Nyman
>> <mathias.nyman@xxxxxxxxxxxxxxx> wrote:
>>>
>>> On 30.11.2016 11:02, Baolin Wang wrote:
>>>>
>>>>
>>>> If the hardware never responds to the stop endpoint command, the
>>>> URBs will never be completed, and we might hang the USB subsystem.
>>>> The original watchdog timer is used to watch if one stop endpoint
>>>> command is timeout, if timeout, then the watchdog timer will set
>>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>>>> pending URBs.
>>>>
>>>> But now we already have one command timer to control command timeout,
>>>> thus we can also use the command timer to watch the stop endpoint
>>>> command, instead of one duplicate watchdog timer which need to be
>>>> removed.
>>>>
>>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>>>> this is the last stop endpoint command of one endpoint. Since we
>>>> can make sure we only set one stop endpoint command for one endpoint
>>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>>>> this flag.
>>>>
>>>> We also need to clean up the command queue before trying to halt the
>>>> xHCI host in xhci_stop_endpoint_command_timeout() function.
>>>
>>>
>>>
>>> This isn't a bad idea.
>>>
>>> There are anyway some corner cases and details that need to be
>>> checked, such as suspend (which will clear the command queue), module
>>> unload
>>> and abrupt host removal (like pci hotplug removal of host controller)
>>> we need to make sure we can trust the command timer to always return the
>>> canceled URB
>>
>>
>> Yes, you are right, we need to check these carefully.
>>
>> Suspend process, module unload and abrupt host removal, they all will
>> issue usb_disconnect() firstly before clear the command queue, it will
>> check URBs for every endpoint by
>> usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(),
>> which will make sure every URBs of endpoints will be cancelled by the
>> stop endpoint command responding or the timeout function of stop
>> endpoint command (xhci_stop_endpoint_command_timeout()) in
>> usb_hcd_flush_endpoint(). From that point, we can make sure the
>> command timer will be useful to handle stop endpoint command timeout.
>> Please correct me if I said something wrong. Thanks.
>>
>
> This relies on current queued command that times out to be the stop endpoint
> command.
>
> If host partially, or completely hangs there might be any number of commands
> in the
> command queue, and we would need to wait for each one of them to timeout,
> finish
> before we finally get to the stop endpoint command, and give back the urb.
>
> I think it would be better to first fix the issues with the current watchdog
> function, get
> those fixes into stable, and then think about moving to the command queue
> timer.
OK, make sense. Thanks.
>
> In short, this patch doesn't currently fix any existing issue, but might
> cause the
> timeout to be more unreliable
>
> -Mathias
--
Baolin.wang
Best Regards