RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests
From: Anurag Kumar Vulisha
Date: Mon Dec 03 2018 - 05:23:31 EST
Hi Alan,
>-----Original Message-----
>From: Alan Stern [mailto:stern@xxxxxxxxxxxxxxxxxxx]
>Sent: Sunday, December 02, 2018 10:06 PM
>To: Anurag Kumar Vulisha <anuragku@xxxxxxxxxx>
>Cc: Felipe Balbi <balbi@xxxxxxxxxx>; Greg Kroah-Hartman
><gregkh@xxxxxxxxxxxxxxxxxxx>; Shuah Khan <shuah@xxxxxxxxxx>; Johan Hovold
><johan@xxxxxxxxxx>; Jaejoong Kim <climbbb.kim@xxxxxxxxx>; Benjamin
>Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>; Roger Quadros <rogerq@xxxxxx>; Manu
>Gautam <mgautam@xxxxxxxxxxxxxx>; martin.petersen@xxxxxxxxxx; Bart Van
>Assche <bvanassche@xxxxxxx>; Mike Christie <mchristi@xxxxxxxxxx>; Matthew
>Wilcox <willy@xxxxxxxxxxxxx>; Colin Ian King <colin.king@xxxxxxxxxxxxx>; linux-
>usb@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; v.anuragkumar@xxxxxxxxx;
>Thinh Nguyen <thinhn@xxxxxxxxxxxx>; Tejas Joglekar
><tejas.joglekar@xxxxxxxxxxxx>; Ajay Yugalkishore Pandey <APANDEY@xxxxxxxxxx>
>Subject: Re: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests
>
>On Sat, 1 Dec 2018, Anurag Kumar Vulisha wrote:
>
>> In some corner cases the gadget controller may get out of sync
>> with host and may get into hang state, thus creating a dealock.
>> For example when bulk streams are enabled for an endpoint, there
>> can be a condition where the gadget controller waits for the host
>> to issue prime transaction and the host controller waits for the
>> gadget to issue ERDY. This condition could create a deadlock.
>>
>> To avoid such potential deadlocks, a timer is started after queuing
>> any request for the endpoint in usb_ep_queue(). The gadget driver
>> is expected to stop the timer if a valid event is found (ex: stream
>> event for stream capable endpoints). If no valid event is found, the
>> timer expires after the programmed timeout value and a timeout
>> callback function registered would be called. This callback function
>> dequeues the request and re-queues it again, doing so makes the
>> controller restart the transfer, thus avoiding deadlocks.
>>
>> This kind of behaviour is observed in dwc3 controller and expected
>> to be generic issue with other controllers supporting bulk streams.
>
>I find this whole approach rather dubious.
>
>First of all, if some sort of deadlock causes a transfer to fail to
>complete, the host is expected to cancel and restart it. Not the
>gadget.
>
Thanks for spending your time in reviewing this patch. The deadlock
is a very rare case scenario and is happening because both the gadget
controller & host controllers get out of sync and are stuck waiting for the
relevant event. For example this issue is observed in stream protocol where
the gadget controller is waiting on Host controller to issue PRIME transaction
and Host controller is waiting on gadget to issue ERDY transaction. Since
the stream protocol is gadget driven, the host may not proceed further until it
receives a valid Start Stream (ERDY) transaction from gadget. Since the gadget
controller driver is aware that the controller is stuck , makes it responsible
to recover the controller from hang condition by restarting the transfer (which
triggers the controller FSM to issue ERDY to host).
>Second, if a request timer expires and the request is cancelled, the
>gadget driver's completion handler will be called. This is not what
>you want if the UDC core is going to resubmit the request
>automatically.
>
>Third, if a request timer expires and the timer handler calls
>usb_ep_dequeue() followed immediately by usb_ep_queue_timeout(), the
>resubmit will probably fail because the dequeue won't have completed
>yet.
>
>Fourth, the patch contains a race between the timer expiring and the
>request completing.
Thanks for correcting, I agree with you on all the above 3 cases that the
resubmission of the request should only be done from the class driver and
the udc core should simply dequeue the request on timeout. I am not sure
why I haven't seen any issue while testing on this patch series. I will modify
the code to handle the resubmitting of requests properly.
Best Regards,
Anurag Kumar Vulisha