RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests

From: Alan Stern
Date: Tue Dec 04 2018 - 11:46:48 EST


On Tue, 4 Dec 2018, Anurag Kumar Vulisha wrote:

> >> Yes the host can cancel the transfer. This issue originated from the endpoints using
> >bulk
> >> streaming protocol and may not occur with normal endpoints. AFAIK bulk streaming
> >is
> >> gadget driven, where the gadget is allowed to select which stream id transfer the
> >host
> >> should work on . Since the host doesn't have control on when the transfer would be
> >> selected by gadget, it may wait for longer timeouts before cancelling the transfer.
> >
> >You're missing the point. Although the device selects which stream ID
> >gets transferred, the _host_ decides whether a stream transfer should
> >occur in the first place. No matter how many ERDY packets the device
> >controller tries to send, no transfer will occur until the host wants
> >to do it.
> >
> >In this sense, stream transfers (like all other USB interactions except
> >wakeup requests) are host-driven.
> >
>
> I agree with you that without host willing to transfer, the transfer wouldn't have
> started and also agree the host always has the capability of detecting the hang. But
> we are not sure on what host platform and operating system the gadget would be
> tested and also not sure whether the host software is capable of handling the deadlock.
> Even if the host has a timer , it would be very long and waiting for the timer to get
> timedout would reduce the performance. In this patch we are giving the user the
> flexibility to opt the timeout value, so that the deadlock can be avoided without
> effecting the performance. So I think adding the timer in gadget is more easier solution
> than handling in host. I have seen this workaround working for both linux & windows.

Is there any way for the device controller driver to detect the problem
without relying on a timeout?

In fact, is this problem a known bug in the existing device controller
hardware? Have the controller manufacturers published a suggested
scheme for working around it?

> >> >> Since the gadget
> >> >> controller driver is aware that the controller is stuck , makes it responsible
> >> >> to recover the controller from hang condition by restarting the transfer (which
> >> >> triggers the controller FSM to issue ERDY to host).
> >> >
> >> >Isn't there a cleaner way to recover than by cancelling the request and
> >> >resubmitting it?
> >> >
> >>
> >> dequeuing the request issues the stop transfer command to the controller, which
> >> cleans all the hardware resource allocated for that endpoint. This also resets the
> >> hardware FSMs for that endpoint . So, when re-queuing of the transfer happens
> >> the controller starts allocating hardware resources again, thus avoiding the
> >probability
> >> of entering into the issue. I am not sure of other controllers, but for dwc3, issuing
> >> the stop transfer is the only way to handle this issue.
> >
> >Again you're missing the point. Can't the controller driver issue the
> >Stop Transfer command but still consider the request to be in progress
> >(i.e., don't dequeue the request) so that the gadget driver's
> >completion callback isn't invoked and the request does not need to be
> >explicitly resubmitted?
> >
>
> Yes , what you are saying is correct. My initial patches were following the
> same approach that you have suggested. We switched to the dequeue
> approach because there can be different gadgets which may enter into
> this issue and it would be better to follow a generic approach rather than
> having every controller driver implementing their own workaround.
> Please find the conservation in this link https://patchwork.kernel.org/patch/10640145/

At this point, it seems that the generic approach will be messier than
having every controller driver implement its own fix. At least, that's
how it appears to me.

> >> @Felipe: Can you please provide your suggestion on this.
> >
> >> >How can the gadget driver know what timeout to use? The host is
> >> >allowed to be as slow as it wants; the gadget driver doesn't have any
> >> >way to tell when the host wants to start the transfer.
> >>
> >> Yes , I agree with you that the timeout may vary from usage to usage. This timeout
> >> should be decided by the class driver which queues the request. As discussed above
> >> this issue was observed in streaming protocol , which is very much faster than
> >normal
> >> BOT modes and it works on super speed .
> >
> >Although USB mass storage is currently the only user of the stream
> >protocol, that may not be true in the future. You should think in more
> >general terms. A timeout which is appropriate for mass storage may not
> >be appropriate for some other application.
> >
>
> You are correct , this timeout may not be ideal. Since I was not able to reproduce this issue
> on non-stream capable transfers , at present the only user of usb_ep_queue_timeout() API
> is the streaming gadget. As we are not aware of the optimal timeout value, instead of
> hardcoding the timeout value we can add module_param() for it. So that the user can configure
> timeout based on their use case. Please let me know your suggestion on this.

Ideally it would not be necessary to rely on a timeout at all.

Also, maintainers dislike module parameters. It would be better not to
add one.

Alan Stern