Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

From: Baolin Wang
Date: Thu Dec 01 2016 - 21:48:39 EST


On 2 December 2016 at 09:17, Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> On 12/01/2016 04:03 PM, Baolin Wang wrote:
>> On 1 December 2016 at 15:44, Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> On 12/01/2016 03:35 PM, Baolin Wang wrote:
>>>> On 1 December 2016 at 14:35, Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> On 12/01/2016 02:04 PM, Baolin Wang wrote:
>>>>>> Hi Baolu,
>>>>>>
>>>>>> On 1 December 2016 at 13:45, Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>>>>>>>> If the hardware never responds to the stop endpoint command, the
>>>>>>>> URBs will never be completed, and we might hang the USB subsystem.
>>>>>>>> The original watchdog timer is used to watch if one stop endpoint
>>>>>>>> command is timeout, if timeout, then the watchdog timer will set
>>>>>>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>>>>>>>> pending URBs.
>>>>>>>>
>>>>>>>> But now we already have one command timer to control command timeout,
>>>>>>>> thus we can also use the command timer to watch the stop endpoint
>>>>>>>> command, instead of one duplicate watchdog timer which need to be
>>>>>>>> removed.
>>>>>>>>
>>>>>>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>>>>>>>> this is the last stop endpoint command of one endpoint. Since we
>>>>>>>> can make sure we only set one stop endpoint command for one endpoint
>>>>>>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>>>>>>>> this flag.
>>>>>>> I am afraid you can't do this. "stop_cmds_pending" was added
>>>>>>> to fix the problem described in the comments that you want to
>>>>>>> remove. But I didn't find any fix of this problem in your patch.
>>>>>> Now we can not pending another stop endpoint command for the same one
>>>>>> endpoint, since will check 'EP_HALT_PENDING' flag in
>>>>>> xhci_urb_dequeue() function to avoid this. But after some
>>>>>> investigation, I think I missed the stop endpoint command in
>>>>>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
>>>>>> maybe need to add 'EP_HALT_PENDING' flag checking in
>>>>>> xhci_stop_device() function. DId I miss something else? Thanks.
>>>>> Consider below three threads running on different CPUs at the same time.
>>>>>
>>>>> Thread A: xhci_handle_cmd_stop_ep() --- in interrupt handler
>>>>> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
>>>>> Thread C: xhci_urb_dequeue --- called by usb core
>>>>>
>>>>> They are serialized by xhci->lock. Let's consider below sequence:
>>>>>
>>>>> Thread A:
>>>>> - delete xhci->cmd_timer), but will fail due to Thread B.
>>>>> - clear EP_HALT_PENDING bit
>>>>>
>>>>> Thread B:
>>>>> - halt the host controller
>>>>>
>>>>> Thread C:
>>>>> - set EP_HALT_PENDING bit
>>>>> - enqueue another stop endpoint command
>>>>> - add the timer back
>>>> Ah, thanks for you comments.
>>>> If thread B halted the host, then the thread C xhci_urb_dequeue() will
>>>> check host state failed, then just return and can not set another stop
>>>> endpoint command.
>>> Not so simple. Thread B will release xhci->lock before xhci_halt().
>> Yes.
>>
>>>> But from your example, I think your meaning is we
>>>> should not halt the host by thread B, since we have handled the stop
>>>> endpoint command event.
>>>>
>>>> So I think we need to check the xhci command of this stop endpoint
>>>> command in xhci_stop_endpoint_command_timeout(), if the xhci command
>>>> of this stop endpoint command is not in the command list (which means
>>>> the stop endpoint command event has been handled), then just return
>>>> and do not halt the controller. Right?
>>>>
>>> I'd like suggest you to put this change into a separated patch.
>>> It's actually a different topic from the main purpose of this patch.
>> OK, I will. Thanks for your comments.
>>
>
> If you are going to work out a v2 patch, please consider whether
> we can totally leverage the common command mechanism to
> handle this stop endpoint command.
>
> Currently, when a stop endpoint command is issued for urb unlink,
> there are two timers for this command. This is duplicated and we
> should remove the stop-endpoint-only timer. The timeout functions
> are also different. The stop-endpoint-only timer just halt the host
> controller (this should be the last sort of way), while the common
> command timer will try to recover the situation by aborting and
> restart the command ring.

Yes, thanks for your reminding and I will think about that.

--
Baolin.wang
Best Regards