Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command
From: Baolin Wang
Date: Thu Dec 01 2016 - 01:04:11 EST
Hi Baolu,
On 1 December 2016 at 13:45, Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>> If the hardware never responds to the stop endpoint command, the
>> URBs will never be completed, and we might hang the USB subsystem.
>> The original watchdog timer is used to watch if one stop endpoint
>> command is timeout, if timeout, then the watchdog timer will set
>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>> pending URBs.
>>
>> But now we already have one command timer to control command timeout,
>> thus we can also use the command timer to watch the stop endpoint
>> command, instead of one duplicate watchdog timer which need to be
>> removed.
>>
>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>> this is the last stop endpoint command of one endpoint. Since we
>> can make sure we only set one stop endpoint command for one endpoint
>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>> this flag.
>
> I am afraid you can't do this. "stop_cmds_pending" was added
> to fix the problem described in the comments that you want to
> remove. But I didn't find any fix of this problem in your patch.
Now we can not pending another stop endpoint command for the same one
endpoint, since will check 'EP_HALT_PENDING' flag in
xhci_urb_dequeue() function to avoid this. But after some
investigation, I think I missed the stop endpoint command in
xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
maybe need to add 'EP_HALT_PENDING' flag checking in
xhci_stop_device() function. DId I miss something else? Thanks.
>
> - * The timer may also fire if the host takes a very long time to respond to the
> - * command, and the stop endpoint command completion handler cannot delete the
> - * timer before the timer function is called. Another endpoint cancellation may
> - * sneak in before the timer function can grab the lock, and that may queue
> - * another stop endpoint command and add the timer back. So we cannot use a
> - * simple flag to say whether there is a pending stop endpoint command for a
> - * particular endpoint.
> - *
> - * Instead we use a combination of that flag and a counter for the number of
> - * pending stop endpoint commands. If the timer is the tail end of the last
> - * stop endpoint command, and the endpoint's command is still pending, we assume
> - * the host is dying.
>
> Best regards,
> Lu Baolu
>
>>
>> We also need to clean up the command queue before trying to halt the
>> xHCI host in xhci_stop_endpoint_command_timeout() function.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxx>
>
--
Baolin.wang
Best Regards