Re: [PATCH net-next v1 2/2] net: dsa: microchip: add ETS Qdisc support for KSZ9477 series

From: Vladimir Oltean
Date: Tue Mar 07 2023 - 14:14:15 EST


On Tue, Mar 07, 2023 at 07:27:32PM +0100, Oleksij Rempel wrote:
> > What do you mean tc-mqprio doesn't support strict priority? Strict
> > priority between traffic classes is what it *does* (the "prio" in the name),
> > although without hardware offload, the prioritization isn't enforced anywhere.
> > Perhaps I'm misunderstanding what you mean?
>
> Huh.. you have right, I overlooked this part of documentation:
> "As one specific example numerous Ethernet cards support the
> 802.1Q link strict priority transmission selection algorithm
> (TSA). MQPRIO enabled hardware in conjunction with the
> classification methods below can provide hardware offloaded
> support for this TSA."
>
> But other parts of manual confuse me.

Not only you...
Does this discussion help any bit?
https://patchwork.kernel.org/project/netdevbpf/patch/20230220150548.2021-1-peti.antal99@xxxxxxxxx/

> May be you can help here:
> - "map - The priority to traffic class map. Maps priorities 0..15 to a
> specified traffic class"
> "Priorities" is probably SO_PRIORITY?

yeah
see the netdev_pick_tx() and skb_tx_hash() implementations for TXQ
selection based on skb->priority, it will answer a lot of questions

> If yes, this option can't be offloaded by the KSZ switch.

because? it can offload the 1:1 prio:tc mapping only; reject anything else

> - "queues - Provide count and offset of queue range for each traffic class..."
> If I see it correctly, I can map a traffic class to some queue.

s/queue/group of TX queues/

> But traffic class is not priority? I can create traffic class with high number
> and map it to a low number queue but actual queue priority is HW specific and
> there is no way to notify user about it.

no, where did you read that you should do that?
traffic class number *is* the number based on which the NIC should do
egress prioritization. Higher traffic class number => higher priority.
Within the same traffic class, there should be round robin between TXQs.
The TXQ number should have nothing to do with priority. Intel igb/igc
NICs do that, and there was a recent discussion about this fact causing
problems with taprio (which reuses the mqprio concepts).

Since commit d7045f520a74 ("net/sched: mqprio: allow reverse TC:TXQ
mappings") it's possible to describe this kind of inherent TXQ priority
like this:

num_tc 8 map 0 1 2 3 4 5 6 7 queues 1@7 1@6 1@5 1@4 1@3 1@2 1@1 1@0

>
> KSZ HW is capable of mapping 8 traffic classes separately to any available
> queue. Ok, if I replace words used in manual from "priority" to "traffic class"
> and "traffic class" to "queues". But even in this case the code will be even
> more confusing - i'll have to use qopt->prio_tc_map array which is SO_PRIO to
> TC map, as TC to queue map.

yeah, but with the 1:1 mapping between PRIO and TC, you only have to
concern about TC:TXQ.

> I still have difficulties to understand how priorities of actual queues
> are organized. I see how to map traffic class to a queue, but I can't find

s/a queue/one or more queues/

> any thing in manual about queue priority. For example, if I assign traffic
> class 3 to the Queue0 this traffic will have lowest priority in my HW. Is
> it some how documented or known for users?

This is a missing piece for mqprio's UAPI indeed. If TXQs have inherent
egress scheduling priority attached to them, there is no mechanism
currently which would allow the user to discover that. He would just
have to "know" that it is like that. Perhaps guided by errors emitted by
the driver, plus extack messages saying that lower TXQ numbers cannot
be mapped to a higher traffic class than higher TXQ numbers.

> One more question is, what is actual expected behavior of mqprio if max_rate
> option is used? In my case, if max_rate is set to a queue (even to max value),
> then strict priority TSA will not work:
> queue0---max rate 100Mbit/s---\
> |---100Mbit/s---
> queue1---max rate 100Mbit/s---/
>
> in this example both streams will get 49Mbit/s. My expectation of strict prio
> is that queue1 should get 100Mbit/s and queue 0Mbit/s

I don't understand this. Have you already implemented mqprio offloading
and this is what you observe?

max_rate is an option per traffic class. Are queue0 and queue1 mapped to
the same traffic class in your example, or are they not? Could you show
the full command you ran?

> On other hand tc-ets made perfect sense to me from documentation and code pow.
> TC is mapped to bands. Bands have documented priorities and it fit's to what
> KSZ is supporting. Except of WRR configuration.

I haven't used tc-ets, I was just curious about the differences you saw
between it and mqprio which led to you choosing it.

> > For strict prioritization using multi-queue on the DSA master you should
> > be able to set up a separate Qdisc.
>
> I'll need to do more testing with FEC later, it didn't worked at first try, but
> as you can see I still have a lot of misunderstandings.

fec doesn't seem to implement ndo_setup_tc() at all, so I'm not sure
what you're going to try exactly. OTOH it has this weird ndo_select_queue()
implementation which (I think) implements multi-queue based on VLAN PCP.

sorry for the quick response, need to go right now