Re: [PATCH v6 20/22] usb: dwc2: host: Properly set even/odd frame

From: Doug Anderson
Date: Tue Feb 02 2016 - 17:47:23 EST


Kever,

On Mon, Feb 1, 2016 at 11:46 PM, Kever Yang <kever.yang@xxxxxxxxxxxxxx> wrote:
> Doug,
>
>
> On 01/29/2016 10:20 AM, Douglas Anderson wrote:
>>
>> When setting up ISO and INT transfers dwc2 needs to specify whether the
>> transfer is for an even or an odd frame (or microframe if the controller
>> is running in high speed mode).
>>
>> The controller appears to use this as a simple way to figure out if a
>> transfer should happen right away (in the current microframe) or should
>> happen at the start of the next microframe. Said another way:
>>
>> - If you set "odd" and the current frame number is odd it appears that
>> the controller will try to transfer right away. Same thing if you set
>> "even" and the current frame number is even.
>> - If the oddness you set and the oddness of the frame number are
>> _different_, the transfer will be delayed until the frame number
>> changes.
>>
>> As I understand it, the above technique allows you to plan ahead of time
>> where possible by always working on the next frame. ...but it still
>> allows you to properly respond immediately to things that happened in
>> the previous frame.
>>
>> The old dwc2_hc_set_even_odd_frame() didn't really handle this concept.
>> It always looked at the frame number and setup the transfer to happen in
>> the next frame. In some cases that meant that certain transactions
>> would be transferred in the wrong frame.
>>
>> We'll try our best to set the even / odd to do the transfer in the
>> scheduled frame. If that fails then we'll do an ugly "schedule ASAP".
>> We'll also modify the scheduler code to handle this and not try to
>> schedule a second transfer for the same frame.
>>
>> Note that this change relies on the work to redo the microframe
>> scheduler. It can work atop ("usb: dwc2: host: Manage frame nums better
>> in scheduler") but it works even better after ("usb: dwc2: host: Totally
>> redo the microframe scheduler").
>>
>> With this change my stressful USB test (USB webcam + USB audio +
>> keyboards) has less audio crackling than before.
>
> Seems this really help for your case?

Yes, I believe it does. Of course my test case is pretty "black box"
for the most part in that I play music on youtube while having a
webcam open and several USB input devices connected. I then try to
decide whether I hear more static or less static. ...clearly a less
subjective test would be better...

* I tried with http://crosreview.com/325451 (see below) and I hear
more static with "use_old = true" than with "use_old = "false".

* I tried with this entire patch reverted and I hear about the same
static as with "use_old = true".

Note that counting reported MISS lines from my logging also shows that
the new code is better...


> Do you check if the transfer can happen right in the current frame? I know
> it's
> quite difficult to check it, but this changes what I know for the dwc core
> schedule the transaction.

Yes. I just tried again, too. I coded up
<https://chromium-review.googlesource.com/325451> and included it. I
then opened up a USB webcam.

With things set to the old way:

115.355370 QH=dc6ba8c0 next(0) fn=10cb, sch=10ca=>10cb (+1) miss=0
115.355373 QH=dc6ba8c0 IMM ready fn=10cb, nxt=10cb
115.355518 QH=dc6ba8c0 next(0) fn=10cc, sch=10cb=>10cc (+1) miss=0
115.355522 QH=dc6ba8c0 IMM ready fn=10cc, nxt=10cc
115.355637 QH=dc6ba8c0 next(0) fn=10cd, sch=10cc=>10cd (+1) miss=0
115.355641 QH=dc6ba8c0 IMM ready fn=10cd, nxt=10cd
115.355857 QH=dc6ba8c0 next(0) fn=10ce, sch=10cd=>10ce (+1) miss=0
115.355859 QH=dc6ba8c0 IMM ready fn=10ce, nxt=10ce
115.355867 QH=dc6ba8c0, wire=10cf, old_wire=10d0, EO diff (use OLD)
115.355870 QH=dc6ba8c0 EO MISS w/ old (10ce != 10cf)
115.356037 QH=dc6ba8c0 next(0) fn=10d0, sch=10cf=>10d0 (+1) miss=1 MISS
115.356039 QH=dc6ba8c0 IMM ready fn=10d0, nxt=10d0
115.356169 QH=dc6ba8c0 next(0) fn=10d1, sch=10d0=>10d1 (+1) miss=0
115.356170 QH=dc6ba8c0 IMM ready fn=10d1, nxt=10d1
115.356269 QH=dc6ba8c0 next(0) fn=10d2, sch=10d1=>10d2 (+1) miss=0
115.356273 QH=dc6ba8c0 IMM ready fn=10d2, nxt=10d2
115.356404 QH=dc6ba8c0 next(0) fn=10d3, sch=10d2=>10d3 (+1) miss=0
115.356407 QH=dc6ba8c0 IMM ready fn=10d3, nxt=10d3

With the new way:

87.814741 QH=e2fd7880 next(0) fn=32e4, sch=32e3=>32e4 (+1) miss=0
87.814744 QH=e2fd7880 IMM ready fn=32e4, nxt=32e4
87.814858 QH=e2fd7880 next(0) fn=32e5, sch=32e4=>32e5 (+1) miss=0
87.814862 QH=e2fd7880 IMM ready fn=32e5, nxt=32e5
87.815010 QH=e2fd7880 next(0) fn=32e6, sch=32e5=>32e6 (+1) miss=0
87.815012 QH=e2fd7880 IMM ready fn=32e6, nxt=32e6
87.815220 QH=e2fd7880 next(0) fn=32e8, sch=32e6=>32e7 (+1) miss=0
87.815222 QH=e2fd7880 IMM ready fn=32e8, nxt=32e7
87.815230 QH=e2fd7880, wire=32e8, old_wire=32e9, EO diff (use NEW)
87.815278 QH=e2fd7880 next(0) fn=32e8, sch=32e7=>32e8 (+1) miss=0
87.815280 QH=e2fd7880 IMM ready fn=32e8, nxt=32e8
87.815390 QH=e2fd7880 next(0) fn=32e9, sch=32e8=>32e9 (+1) miss=0
87.815391 QH=e2fd7880 IMM ready fn=32e9, nxt=32e9
87.815491 QH=e2fd7880 next(0) fn=32ea, sch=32e9=>32ea (+1) miss=0
87.815493 QH=e2fd7880 IMM ready fn=32ea, nxt=32ea
87.815635 QH=e2fd7880 next(0) fn=32eb, sch=32ea=>32eb (+1) miss=0
87.815638 QH=e2fd7880 IMM ready fn=32eb, nxt=32eb


Note that with my TEST-ONLY patch the old way is still _slightly_
different in that I still communicate back to the scheduler with:

chan->qh->next_active_frame = now_frame;

The old code didn't used to do that. If I don't do that then you
you'll just stay in an inconsistent state for a while where things are
going on the wire 1 frame later than we think they are.


Also note that above you can see that the new way is indeed able to
schedule things in the current microframe. Looking one line at a
time:


87.815012 QH=e2fd7880 IMM ready fn=32e6, nxt=32e6

QH e2fd7880 is going straight to the ready queue. Actual frame number
in hardware is 32e6. next_active_frame = 32e6 which means we ideally
want to give it to hardware in 32e6 and wire frame is 32e7.


87.815220 QH=e2fd7880 next(0) fn=32e8, sch=32e6=>32e7 (+1) miss=0
87.815222 QH=e2fd7880 IMM ready fn=32e8, nxt=32e7

Frame number in hardware is now 32e8. We'd like to give the next
transfer to hardware in 32e7 to transfer on the wire at 32e8, but
that's obviously impossible. We will try to give it right away.


87.815230 QH=e2fd7880, wire=32e8, old_wire=32e9, EO diff (use NEW)

Showing a difference in the old way. We'll choose "even" to have the
packet go on the wire (expecting 32e8).


87.815278 QH=e2fd7880 next(0) fn=32e8, sch=32e7=>32e8 (+1) miss=0
87.815280 QH=e2fd7880 IMM ready fn=32e8, nxt=32e8

We got a response back and are ready to schedule the next transfer and
it's still 32e8! That means that transfer must have happened (as
expected) in 32e8. Whew! Give the next transfer to hardware hoping
for 32e9 wire.


87.815390 QH=e2fd7880 next(0) fn=32e9, sch=32e8=>32e9 (+1) miss=0

Now at hardware 32e9 and ready to schedule the next...



> In dwc_otgbook, Interrupt OUT Transactions(also similar for Int IN, Iso
> IN/OUT)
> in DMA Mode, the normal Interrupt OUT operation says:
> The DWC_otg host attempts to send out the OUT token in the beginning of next
> odd frame/microframe.
>
> So I'm confuse about if the dwc core can do the transaction at the same
> frame
> of host channel initialized or not.

The docbook is obviously way too terse here, but the above experiment
shows that the hardware is designed in the only sane way that it could
be designed.

Why do I say that this is the only sane way for the hardware to work?
I think all the following is true (please correct any errors):

A) HW only lets you specify even/odd which means you choose between
two frame to send the packet. Two possible ways HW could be
implemented: "sane" way means you can send a packet in frame "x" and
"x + 1". "insane" way means you can send a packet in frame "x + 1"
and "x + 2" but not frame "x"

B) In some cases (especially with regards to SPLIT transfers), we need
to use the result of a transfer in uFrame "x" to decide what to do
about uFrame "x + 1". Specifically for IN transfers I think we can't
know for sure whether we'll get back all of our data in uFrame "x" or
whether we'll only get part of the data and need uFrame "x + 1".

C) It's possible to schedule 100us worth of periodic transfers in one
125us uFrame.

D) We can't know the result of a transfer until that transfer is done.


So above basically means that we might have a periodic transfer where
we get the result of the transfer 100us into a uFrame. We've now got
to quickly queue up the transfer for the next uFrame. If hardware was
designed in the "insane" way then we'd need an interrupt latency of <
25 us since once the frame ticked we'd no longer be able to schedule.
If hardware was designed in the "sane" way then we'd "only" need an
interrupt latency of 125 us since we could continue to schedule even
partway through the current frame.

Also note that if there's any chance that a periodic transfer ends
later than 100 us into a frame (like if a non-periodic transfer snuck
in there because we were out of periodic channels) then the above
problem becomes even more extreme.



-Doug