Re: [PATCH net-next v5] sched: Add dualpi2 qdisc

From: Tilmans, Olivier (Nokia - BE/Antwerp)
Date: Fri Aug 23 2019 - 08:59:55 EST


> 1) Since we're still duking it out over the meaning of the bits - not
> just the SCE thing, but as best as I can
> tell (but could be wrong) the NQB idea wants to put something into the
> l4s fast queue? Or is NQB supposed to
> be a third queue?

We can add support for NQB in the future, by expanding the
dualpi2_skb_classify() function. This is however out of scope at the
moment as NQB is not yet adopted by the TSV WG. I'd guess we may want more
than just the NQB DSCP codepoint in the L queue, which then warrant
another way to classify traffic, e.g., using tc filter hints.

> In those cases, the ecn_mask should just be mask.

That is actually what it is at the moment: a mask on the two ecn bits.

> 2) Is the intent to make the drop probability 0 by default? (10 in the
> pie rfc, not mentioned in the l4s rfc as yet)

I assume you are referring to Â5.1 of the PIE RFC, i.e., the switch to
pure drop once the computed marking probability is >10%?

The default for dualpi2 is also to enter a pure-drop mode on overload.
More precisely, we define overload as reaching a marking probability of
100% in the L queue, meaning an internal PI probability of 50% (as it
gets mutiplied by the coupling factor which defaults to 2).
This is equivalent to a PIE probability of 25% (as the classic queue gets a
squared probability).
This drop mode means that packets in both queues will be subject to
random drops with a PI^2 probability. Additionally, all packets not
dropped in the L queue are CE marked.

We used to have a parameter to configure this overload threshold (IIRC
it was still in the pre-v4 patches), but found no real use for lowering
it, hence its removal.

Note that the drop on overload can be disabled, resulting in increased
latencies in both queues, 100% CE marking in the L queue, and eventually
a taildrop behaviour once the packet limit is reached.

> 3) has this been tested on a hw mq system as yet? (10gigE is typically
> 64 queues)

Yes, in a setup where 1/32/64/128VMs were behind an Intel X540-*, which indeed
has 64 internal queues. The VMs use a mix of long/short cubic/DCTCP connections
towards another server. I could not think about another use-case where a 10G
software switch would prove to be a bottleneck, i.e., where a queue would
happen.
The qdisc is however not optimized for mq systems, could it cause performance
degradation if the server was severely resource constrained?

Also, ensuring it was able to saturate 10G meant gro was required on the
hypervisor, thus that the step threshold of dualpi2 has to be increased to
compensate for those large bursts. Maybe that is where being mq-aware would
help, i.e., by instantiating one dualpi2 instance per HW queue?
The AQM scheme itself is CPU friendly (lighter than PIE)--i.e., computing the
probability takes <10 arithmetic ops and 5 comparisons once every 16ms, while
enqueue/dequeue can involve ~10 comparisons and at most 2 rng calls)--so
should not increase the load too much issues if it was duplicated.


Best,
Olivier