Re: [PATCH] b: re-queue tx dma request on herror
From: Maxim Uvarov
Date: Tue Nov 29 2016 - 03:50:06 EST
Hello Bin,
thanks for decoding errors. I was able to reproduce it with the latest
master while logs are for 4.1 kernel. The main reason was 25 mhz
generator instead of 24 on remote hid device. With replacing
everything looks better, i.e. hidraw can operate 3-4 days and then the
same continuous musb 3-strikes. It might be some buggy device or some
other issue with continues writes to /dev/hidraw.
please do not apply this patch.
Thanks,
Maxim.
2016-11-29 6:39 GMT+03:00 Bin Liu <b-liu@xxxxxx>:
> Hi,
>
> On Wed, Nov 16, 2016 at 10:42:03AM +0300, Max Uvarov wrote:
>> Some times dma transfer to usb endpoint fails:
>>
>> [ 78.378283] musb-hdrc musb-hdrc.1.auto: Start TX10 dma
>> [ 78.410763] musb-hdrc musb-hdrc.1.auto: OUT/TX10 end, csr 3400, dma
>> [ 78.410896] musb-hdrc musb-hdrc.1.auto: complete dc01eb00 usb_api_blocking_completion+0x0/0x24 [usbcore] (0), dev4 ep1out, 10/10
>> [ 78.411181] musb-hdrc musb-hdrc.1.auto: qh dc01ed00 periodic slot 10
>> [ 78.411205] musb-hdrc musb-hdrc.1.auto: qh dc01ed00 urb dc01eb00 dev4 ep1out-intr, hw_ep 10, dd624d00/10
>> [ 78.411223] musb-hdrc musb-hdrc.1.auto: --> hw10 urb dc01eb00 spd2 dev4 ep1out h_addr83 h_port02 bytes 10
>> [ 78.411244] musb-hdrc musb-hdrc.1.auto: check whether there's still time for periodic Tx
>> [ 78.411256] musb-hdrc musb-hdrc.1.auto: Start TX10 dma
>> [ 78.443762] musb-hdrc musb-hdrc.1.auto: OUT/TX10 end, csr 3500, dma
>
> What kernel version is this? Log format looks old.
>
>> success transmition:
>>
>> [ 78.443889] musb-hdrc musb-hdrc.1.auto: complete dc01eb00 usb_api_blocking_completion+0x0/0x24 [usbcore] (0), dev4 ep1out, 10/10
>> [ 78.444170] musb-hdrc musb-hdrc.1.auto: qh dc01ed00 periodic slot 10
>> [ 78.444195] musb-hdrc musb-hdrc.1.auto: qh dc01ed00 urb dc01eb00 dev4 ep1out-intr, hw_ep 10, dd624d00/10
>> [ 78.444213] musb-hdrc musb-hdrc.1.auto: --> hw10 urb dc01eb00 spd2 dev4 ep1out h_addr83 h_port02 bytes 10
>> [ 78.444232] musb-hdrc musb-hdrc.1.auto: check whether there's still time for periodic Tx
>> [ 78.444245] musb-hdrc musb-hdrc.1.auto: Start TX10 dma
>> [ 78.540761] musb-hdrc musb-hdrc.1.auto: OUT/TX10 end, csr 3504, dma
>>
>> failed transmission:
>>
>> [ 78.540780] musb-hdrc musb-hdrc.1.auto: TX 3strikes on ep=10 set ETIMEDOUT
>> [ 78.540897] musb-hdrc musb-hdrc.1.auto: complete dc01eb00 usb_api_blocking_completion+0x0/0x24 [usbcore] (-110), dev4 ep1out, 10/10
>> [ 78.540945] musb-hdrc musb-hdrc.1.auto: extra TX10 ready, csr 2500
>> [ 78.540989] usb 2-1.1.2: urb wait ok but retval -110
>>
>> Use to reproduce is writes to /dev/hidraw0 device which end up with early
>> unexpected timeout and -110 errno set.
>>
>> Code sets timeout to 5 seconds
>> vfs_write()->hidraw_write()->hidraw_send_report()->usbhid_output_report()
>> ret = usb_interrupt_msg(dev, usbhid->urbout->pipe,
>> buf, count, &actual_length,
>> USB_CTRL_SET_TIMEOUT);
>> where is set to 5 second:
>
> The two timeout here are completely different - ETIMEOUT on 3strikes vs.
> USB_CTRL_SET_TIMEOUT.
>
> USB_CTRL_SET_TIMEOUT only takes effects when the usb device NAK the
> interrupt transaction for 5 seconds.
>
> But when MUSB reports 3strikes ETIMEOUT, it means the usb device doesn't
> respond, no handshake packet at all. The host tried TX 3 times, all
> received no response handshake, so ETIMEOUT.
>
>> when it wents to usb_start_wait_urb() where waits for completition:
>> retval = usb_submit_urb(urb, GFP_NOIO);
>>
>> expire = timeout ? msecs_to_jiffies(timeout) : MAX_SCHEDULE_TIMEOUT;
>> if (!wait_for_completion_timeout(&ctx.done, expire)) {
>>
>> So user space application expects that write will be done in 5 seconds or
>> error will happen. But musb driver exists this logic on first dma error
>
> This ETIMEOUT is not a dma error, but something wrong on the wire which
> causes the usb device is unable to respond, typically not received the
> TX packet at all.
>
>> without trying to retransmit current urb. This patch adds current request
>
> The host does trying to retransmit the TX packet 3 times, before reports
> ETIMEOUT.
>
>> to the end of list, destroys current dma transfer and renew transmission.
>> In that case this urb transmitted in next cycle and not failing with error
>> before timeout.
>
> NAK.
>
> It is up to the application to retry the transaction,
>
> Regards,
> -Bin.
>
>>
>> Signed-off-by: Max Uvarov <muvarov@xxxxxxxxx>
>> ---
>> drivers/usb/musb/musb_host.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> index 53bc4ce..e44ae95 100644
>> --- a/drivers/usb/musb/musb_host.c
>> +++ b/drivers/usb/musb/musb_host.c
>> @@ -1293,11 +1293,11 @@ void musb_host_tx(struct musb *musb, u8 epnum)
>> status = -EPIPE;
>>
>> } else if (tx_csr & MUSB_TXCSR_H_ERROR) {
>> - /* (NON-ISO) dma was disabled, fifo flushed */
>> musb_dbg(musb, "TX 3strikes on ep=%d", epnum);
>> -
>> - status = -ETIMEDOUT;
>> -
>> + if (dma)
>> + musb_bulk_nak_timeout(musb, hw_ep, 0);
>> + else
>> + status = -ETIMEDOUT;
>> } else if (tx_csr & MUSB_TXCSR_H_NAKTIMEOUT) {
>> if (USB_ENDPOINT_XFER_BULK == qh->type && qh->mux == 1
>> && !list_is_singular(&musb->out_bulk)) {
>> --
>> 1.9.1
>>
--
Best regards,
Maxim Uvarov