Re: [RFC v3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

From: Yunsheng Lin
Date: Tue Mar 23 2021 - 07:35:39 EST


On 2021/3/23 14:37, Ahmad Fatoum wrote:
> Hi,
>
> On 22.03.21 10:09, Yunsheng Lin wrote:
>> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
>> flag set, but queue discipline by-pass does not work for lockless
>> qdisc because skb is always enqueued to qdisc even when the qdisc
>> is empty, see __dev_xmit_skb().
>>
>> This patch calls sch_direct_xmit() to transmit the skb directly
>> to the driver for empty lockless qdisc too, which aviod enqueuing
>> and dequeuing operation. qdisc->empty is set to false whenever a
>> skb is enqueued, see pfifo_fast_enqueue(), and is set to true when
>> skb dequeuing return NULL, see pfifo_fast_dequeue().
>>
>> There is a data race between enqueue/dequeue and qdisc->empty
>> setting, qdisc->empty is only used as a hint, so we need to call
>> sch_may_need_requeuing() to see if the queue is really empty and if
>> there is requeued skb, which has higher priority than the current
>> skb.
>>
>> The performance for ip_forward test increases about 10% with this
>> patch.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>> ---
>> Hi, Vladimir and Ahmad
>> Please give it a test to see if there is any out of order
>> packet for this patch, which has removed the priv->lock added in
>> RFC v2.
>
> Overnight test (10h, 64 mil frames) didn't see any out-of-order frames
> between 2 FlexCANs on a dual core machine:
>
> Tested-by: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx>
>
> No performance measurements taken.

Thanks for the testing.
And I has done the performance measurement.

L3 forward testing improves from 1.09Mpps to 1.21Mpps, still about
10% improvement.

pktgen + dummy netdev:

threads without+this_patch with+this_patch delta
1 2.56Mpps 3.11Mpps +21%
2 3.76Mpps 4.31Mpps +14%
4 5.51Mpps 5.53Mpps +0.3%
8 2.81Mpps 2.72Mpps -3%
16 2.24Mpps 2.22Mpps -0.8%

>
>>