Re: More dwc3 gadget issues with adb

From: John Stultz
Date: Thu Apr 16 2020 - 14:00:45 EST


On Thu, Apr 16, 2020 at 1:19 AM Felipe Balbi <balbi@xxxxxxxxxx> wrote:
> John Stultz <john.stultz@xxxxxxxxxx> writes:
> > Hey Felipe,
> > Last week or so, a change[1] in AOSP to adbd seemingly uncovered
> > another issue with dwc3 gadget scatter-gather support on HiKey960.
> >
> > Interestingly it doesn't seem to affect the Dragonboard 845c, which
> > uses the same dwc3 driver and has had its own issues in the past.
> >
> > The behavior is the same as we saw last time around with both devices.
> > After booting the device, running "adb logcat -d" (or really any adb
> > command that transfers more than a trivial amount of data) on the host
> > will result in the adb output seeming to stall. Any further adb
> > invocations to the device will hang indefinitely.
> >
> > I've captured trace events for before the change (works), after the
> > change (broken), and after the change with the sg_enabled flag turned
> > off (which works around the problem).
> >
> > Let me know if there is anything else useful for me to share.
>
> First the obvious questions: Which kernel version is this?

Apologies. v5.7-rc1, though the same behavior was seen with v5.6 and
v5.4 kernels.

> What does
> "before" and "after" refer to in our traces?

Before and after the userland changes to adb were made.


> What are first working and
> first failing versions? Can you run git bisect?

It's not directly a kernel regression. But it seems like an uncovered
issue by changes in userland.


> One thing I noticed is that we're missing a giveback on ep1out. Here's a
> working case:
>
> UsbFfs-worker-580 [002] d..1 66.704886: dwc3_alloc_request: ep1out: req 0000000011c55648 length 0/0 zsI ==> 0
> UsbFfs-worker-580 [002] d..2 66.704889: dwc3_ep_queue: ep1out: req 0000000011c55648 length 0/16384 zsI ==> -115
> UsbFfs-worker-580 [002] d..2 66.704892: dwc3_prepare_trb: ep1out: trb 000000003559c11c (E27:D7) buf 000000008843b000 size 16384 ctrl 00000819 (HlcS:sC:normal)
> UsbFfs-worker-580 [002] d..2 66.704897: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705053: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> irq/65-dwc3-522 [000] d..1 66.705054: dwc3_complete_trb: ep1out: trb 000000008c350fb3 (E27:D8) buf 0000000089d6b000 size 16360 ctrl 00000818 (hlcS:sC:normal)
> irq/65-dwc3-522 [000] d..1 66.705058: dwc3_gadget_giveback: ep1out: req 0000000001b9ed3f length 24/16384 zsI ==> 0
> kworker/u16:2-260 [001] .... 66.705097: dwc3_free_request: ep1out: req 0000000001b9ed3f length 24/16384 zsI ==> 0
>
> and the failure point:
>
> UsbFfs-worker-580 [002] d..1 66.705129: dwc3_alloc_request: ep1out: req 0000000067a34de4 length 0/0 zsI ==> 0
> UsbFfs-worker-580 [002] d..2 66.705131: dwc3_ep_queue: ep1out: req 0000000067a34de4 length 0/16384 zsI ==> -115
> UsbFfs-worker-580 [002] d..2 66.705134: dwc3_prepare_trb: ep1out: trb 00000000f3db4076 (E28:D8) buf 000000008843f000 size 16384 ctrl 00000819 (HlcS:sC:normal)
> UsbFfs-worker-580 [002] d..2 66.705141: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705309: dwc3_event: event (00006084): ep1out: Transfer In Progress [0] (SIm)
> irq/65-dwc3-522 [000] d..1 66.705310: dwc3_complete_trb: ep1out: trb 0000000092deef41 (E28:D9) buf 00000000ba8f1000 size 4072 ctrl 0000001c (hlCS:sc:normal)
> irq/65-dwc3-522 [000] d..1 66.705318: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705323: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705329: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705334: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705339: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705344: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705349: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
> irq/65-dwc3-522 [000] d..1 66.705354: dwc3_gadget_ep_cmd: ep1out: cmd 'Update Transfer' [20007] params 00000000 00000000 00000000 --> status: Successful
>
> One interesting thing is that TRB addresses are "odd". I can't find a
> proper lifetime for these TRBs. Do you have IOMMU enabled? Can you run
> without it? For example, nowhere in the log can I find the place where
> trb 0000000092deef41 was first enqueue. I'm assuming the log to be
> ordered, which means that trb is the same as 00000000f3db4076. But why
> are the addresses different?
>
> Another weird thing is that even though we CHN bit being set in
> 0000000092deef41, we don't see where the second trb (the one its chained
> to) was prepared. It seems like it was *never* prepared, what gives?

I'll try to take a deeper look at these points and get back to you. I
really appreciate your eyes on this!

Thanks for the feedback!
-john