On Thu, Jun 08, 2023 at 02:44:46PM -0400, Jamal Hadi Salim wrote:
Other than the refcount issue i think the approach looks reasonable to
me. The stats before/after you are showing below though are
interesting; are you showing a transient phase where packets are
temporarily in the backlog. Typically the backlog is a transient phase
which lasts a very short period. Maybe it works differently for
taprio? I took a quick look at the code and do see to decrement the
backlog in the dequeue, so if it is not transient then some code path
is not being hit.
It's a fair concern. The thing is that I put very aggressive time slots
in the schedule that I'm testing with, and my kernel has a lot of
debugging stuff which bogs it down (kasan, kmemleak, lockdep, DMA API
debug etc). Not to mention that the CPU isn't the fastest to begin with.
The way taprio works is that there's a hrtimer which fires at the
expiration time of the current schedule entry and sets up the gates for
the next one. Each schedule entry has a gate for each traffic class
which determines what traffic classes are eligible for dequeue() and
which ones aren't.
The dequeue() procedure, though also invoked by the advance_schedule()
hrtimer -> __netif_schedule(), is also time-sensitive. By the time
taprio_dequeue() runs, taprio_entry_allows_tx() function might return
false when the system is so bogged down that it wasn't able to make
enough progress to dequeue() an skb in time. When that happens, there is
no mechanism, currently, to age out packets that stood too much in the
TX queues (what does "too much" mean?).
Whereas enqueue() is technically not time-sensitive, i.e. you can
enqueue whenever you want and the Qdisc will dequeue whenever it can.
Though in practice, to make this scheduling technique useful, the user
space enqueue should also be time-aware (though you can't capture this
with ping).
If I increase all my sched-entry intervals by a factor of 100, the
backlog issue goes away and the system can make forward progress.
So yeah, sorry, I didn't pay too much attention to the data I was
presenting for illustrative purposes.
Aside: I realize you are busy - but if you get time and provide some
sample tc command lines for testing we could help create the tests for
you, at least the first time. The advantage of putting these tests in
tools/testing/selftests/tc-testing/ is that there are test tools out
there that run these tests and so regressions are easier to catch
sooner.
Yeah, ok. The script posted in a reply on the cover letter is still what
I'm working with. The things it intends to capture are:
- attaching a custom Qdisc to one of taprio's classes doesn't fail
- attaching taprio to one of taprio's classes fails
- sending packets through one queue increases the counters (any counters)
of just that queue
All the above, replicated once for the software scheduling case and once
for the offload case. Currently netdevsim doesn't attempt to emulate
taprio offload.
Is there a way to skip tests? I may look into tdc, but I honestly don't
have time for unrelated stuff such as figuring out why my kernel isn't
configured for the other tests to pass - and it seems that once one test
fails, the others are completely skipped, see below.
Also, by which rule are the test IDs created?
root@debian:~# cd selftests/tc-testing/
root@debian:~/selftests/tc-testing# ./tdc.sh
considering category qdisc
-- ns/SubPlugin.__init__
Test 0582: Create QFQ with default setting
Test c9a3: Create QFQ with class weight setting
Test d364: Test QFQ with max class weight setting
Test 8452: Create QFQ with class maxpkt setting
Test 22df: Test QFQ class maxpkt setting lower bound
Test 92ee: Test QFQ class maxpkt setting upper bound
Test d920: Create QFQ with multiple class setting
Test 0548: Delete QFQ with handle
Test 5901: Show QFQ class
Test 0385: Create DRR with default setting
Test 2375: Delete DRR with handle
Test 3092: Show DRR class
Test 3460: Create CBQ with default setting
exit: 2
exit: 0
Error: Specified qdisc kind is unknown.
(...)