Re: [Linuxarm] [PATCH net] net: sched: fix packet stuck problem for lockless qdisc

From: Yunsheng Lin
Date: Fri Mar 19 2021 - 05:26:38 EST


On 2021/3/18 14:53, Yunsheng Lin wrote:
> Lockless qdisc has below concurrent problem:
> cpu0 cpu1
> . .
> q->enqueue .
> . .
> qdisc_run_begin() .
> . .
> dequeue_skb() .
> . .
> sch_direct_xmit() .
> . .
> . q->enqueue
> . qdisc_run_begin()
> . return and do nothing
> . .
> qdisc_run_end() .
>
> cpu1 enqueue a skb without calling __qdisc_run() because cpu0
> has not released the lock yet and spin_trylock() return false
> for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
> enqueued by cpu1 when calling dequeue_skb() because cpu1 may
> enqueue the skb after cpu0 calling dequeue_skb() and before
> cpu0 calling qdisc_run_end().
>
> Lockless qdisc has another concurrent problem when tx_action
> is involved:
>
> cpu0(serving tx_action) cpu1 cpu2
> . . .
> . q->enqueue .
> . qdisc_run_begin() .
> . dequeue_skb() .
> . . q->enqueue
> . . .
> . sch_direct_xmit() .
> . . qdisc_run_begin()
> . . return and do nothing
> . . .
> clear __QDISC_STATE_SCHED . .
> qdisc_run_begin() . .
> return and do nothing . .
> . . .
> . qdisc_run_begin() .
>
> This patch fixes the above data race by:
> 1. Set a flag after spin_trylock() return false.
> 2. Retry a spin_trylock() in case other CPU may not see the
> new flag after it releases the lock.
> 3. reschedule if the flag is set after the lock is released
> at the end of qdisc_run_end().
>
> For tx_action case, the flags is also set when cpu1 is at the
> end if qdisc_run_begin(), so tx_action will be rescheduled
> again to dequeue the skb enqueued by cpu2.
>
> Also clear the flag before dequeuing in order to reduce the
> overhead of the above process, and aviod doing the heavy
> test_and_clear_bit() at the end of qdisc_run_end().
>
> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
> ---
> For those who has not been following the qdsic scheduling
> discussion, there is packet stuck problem for lockless qdisc,
> see [1], and I has done some cleanup and added some enhanced
> features too, see [2] [3].
> While I was doing the optimization for lockless qdisc, it
> accurred to me that these optimization is useless if there is
> still basic bug in lockless qdisc, even the bug is not easily
> reproducible. So look through [1] again, I found that the data
> race for tx action mentioned by Michael, and thought deep about
> it and came up with this patch trying to fix it.
>
> So I am really appreciated some who still has the reproducer
> can try this patch and report back.

I had done some performance test to see if there is value to
fix the packet stuck problem and support lockless qdisc bypass,
here is some result using pktgen in 'queue_xmit' mode on a dummy
device as Paolo Abeni had done in [1], and using pfifo_fast qdisc:

threads vanilla locked-qdisc vanilla+this_patch
1 2.6Mpps 2.9Mpps 2.5Mpps
2 3.9Mpps 4.8Mpps 3.6Mpps
4 5.6Mpps 3.0Mpps 4.7Mpps
8 2.7Mpps 1.6Mpps 2.8Mpps
16 2.2Mpps 1.3Mpps 2.3Mpps

locked-qdisc: test by removing the "TCQ_F_NOLOCK | TCQ_F_CPUSTATS".

And add the lockless qdisc bypatch and other optimization upon
this patch:

threads patch_set_1 patch_set_2 patch_set_3
1 2.5Mpps 3.0Mpps 3.0Mpps
2 3.6Mpps 4.1Mpps 5.3Mpps
4 4.7Mpps 4.6Mpps 5.1Mpps
8 2.8Mpps 2.6Mpps 2.7Mpps
16 2.3Mpps 2.2Mpps 2.2Mpps

patch_set_1: vanilla + this_patch
patch_set_2: vanilla + this_patch + lockless_qdisc_bypass_patch
patch_set_3: vanilla + this_patch + lockless_qdisc_bypass_patch +
remove_seq_operation_for_lockless_qdisc_optimization +
check_rc_before_calling_qdisc_run()_optimization +
spin_trylock()_retry_optimization.

So all the fix and optimization added together, the lockless qdisc
has better performance than vanilla except for the 4 threads case,
which has about 9% performance degradation than vanilla one, but still
better than the locked-qdisc.


>
> 1. https://lore.kernel.org/netdev/d102074f-7489-e35a-98cf-e2cad7efd8a2@xxxxxxxxxxxxx/t/#ma7013a79b8c4d8e7c49015c724e481e6d5325b32
> 2. https://patchwork.kernel.org/project/netdevbpf/patch/1615777818-13969-1-git-send-email-linyunsheng@xxxxxxxxxx/
> 3. https://patchwork.kernel.org/project/netdevbpf/patch/1615800610-34700-1-git-send-email-linyunsheng@xxxxxxxxxx/
>