Re: More dwc3 gadget issues with adb

From: Felipe Balbi
Date: Thu Apr 23 2020 - 05:22:49 EST



Hi,

John Stultz <john.stultz@xxxxxxxxxx> writes:
> On Tue, Apr 21, 2020 at 11:00 PM Felipe Balbi <balbi@xxxxxxxxxx> wrote:
>> John Stultz <john.stultz@xxxxxxxxxx> writes:
>> > On Thu, Apr 16, 2020 at 1:19 AM Felipe Balbi <balbi@xxxxxxxxxx> wrote:
>> >> One thing I noticed is that we're missing a giveback on ep1out. Here's a
>> >> working case:
>> >>
>> >
>> > Hey Felipe,
>> > So I found some time to dig around on this today and I started
>> > trying to understand this issue that you've pointed out about missing
>> > the giveback.
>> >
>> > It seems part of the issue is we get to a point where we have some req
>> > where pending_sgs is more than one.
>> >
>> > We call dwc3_prepare_one_trb_sg() on it:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n1068
>> >
>> > And we process the sg list incrementing req->num_queued_sgs for each one.
>> >
>> > then later, dwc3_gadget_ep_cleanup_completed_request() is called on the request:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n2522
>> >
>> > We call dwc3_gadget_ep_reclaim_trb_sg()
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n2470
>> >
>> > Where we iterate over the req->sg, ideally decrementing
>> > num_pending_sgs each time and return.
>> >
>> > But back in dwc3_gadget_ep_cleanup_completed_request() and there
>> > we're hitting the:
>> > if (!dwc3_gadget_ep_request_completed(req) ||
>> > req->num_pending_sgs) {
>> > case which causes us to skip the call to dwc3_gadget_giveback().
>>
>> This logic was modified recently. Can you check if today's linus/master
>> works for you?
>
> I was testing this against 5.7-rc2, but I updated to linus/master and
> I'm not seeing any change.

good

>> > Looking as to why the num_pending_sgs is non zero, that's because in
>> > dwc3_gadget_ep_reclaim_trb_sg we're hitting the case where the trb has
>> > the DWC3_TRB_CTRL_HWO ctrl flag set, which breaks us out of the loop
>> > early before we decrement num_pending_sgs.
>> >
>> > For that trb, we're setting the HWO flag in __dwc3_prepare_one_trb()
>> > (called from dwc3_prepare_one_trb_sg() back at the beginning):
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n921
>> >
>> > I added logic showing every time we set or clear that flag, and it
>> > seems like we're always setting it but never clearing it. And often
>>
>> HW clears HWO flag. We only have to manually clear in one or two cases.
>
> I guess the bit that worries me with the current code is in
> dwc3_gadget_ep_reclaim_completed_trb():
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n2406
>
> There is the logic:
> if (chain && (trb->ctrl & DWC3_TRB_CTRL_HWO))
> trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
>
> But that will never trip, as the only time we call
> dwc3_gadget_ep_reclaim_completed_trb() with chain==true is from
> dwc3_gadget_ep_reclaim_trb_sg():
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n2470
>
> Where if the trb->ctrl has the HWO flag set, we break out before
> calling dwc3_gadget_ep_reclaim_completed_trb().
>
> So the logic quoted above seems to be effectively dead code (despite
> the big comment as to why we need it)

There is another situation when we use CHN bit. If we're starting an OUT
transfer and the length is not aligned to wMaxPacketSize.

> That's why I suspected the HWO check in
> dwc3_gadget_ep_reclaim_trb_sg() is problematic.

okay

>> > that's not an issue as we only have one sg entry. But if its set on a
>> > trb in a request with multiple sgs, that's where it seems to be
>> > causing the issue.
>>
>> The issue is completing with HWO set, which should never happen. Can you
>> collect tracepoints with linus/master of this particular situation?
>
> Will do, though again, its a little tough as often we hit the stall
> state pretty quickly at bootup, before I can even try to enable
> tracing, so it may take a few tries.

IIRC there's a way to configure tracepoints from cmdline, I never tested
it though.

>> >> One interesting thing is that TRB addresses are "odd". I can't find a
>> >> proper lifetime for these TRBs. Do you have IOMMU enabled? Can you run
>> >> without it? For example, nowhere in the log can I find the place where
>> >> trb 0000000092deef41 was first enqueue. I'm assuming the log to be
>> >> ordered, which means that trb is the same as 00000000f3db4076. But why
>> >> are the addresses different?
>> >>
>> >> Another weird thing is that even though we CHN bit being set in
>> >> 0000000092deef41, we don't see where the second trb (the one its chained
>> >> to) was prepared. It seems like it was *never* prepared, what gives?
>> >
>> > I suspect these bits were due to the tracing happening after some
>> > minor amount of initial adb traffic began at bootup? So the trace
>> > isn't capturing all the events.
>>
>> No, no. That's more likely to be IOMMU mucking up the addresses. ADB is
>> very sequential in its behavior and USB transfers requests in
>> order. Please run linus/master without IOMMU.
>
> So I didn't have any IOMMU support enabled in the config I was testing
> with. I went through and added IOMMU options in my config with
> linus/master as well and that didn't seem to change the behavior
> either.

pretty odd that the TRB addresses change like that.

> I'll get back to you with further trace logs.

thanks

--
balbi

Attachment: signature.asc
Description: PGP signature