Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc

From: Juergen Gross
Date: Mon Apr 12 2021 - 04:21:36 EST


On 12.04.21 03:04, Yunsheng Lin wrote:
On 2021/4/9 13:31, Juergen Gross wrote:
On 25.03.21 04:13, Yunsheng Lin wrote:
Lockless qdisc has below concurrent problem:
cpu0 cpu1
. .
q->enqueue .
. .
qdisc_run_begin() .
. .
dequeue_skb() .
. .
sch_direct_xmit() .
. .
. q->enqueue
. qdisc_run_begin()
. return and do nothing
. .
qdisc_run_end() .

cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().

Lockless qdisc has below another concurrent problem when
tx_action is involved:

cpu0(serving tx_action) cpu1 cpu2
. . .
. q->enqueue .
. qdisc_run_begin() .
. dequeue_skb() .
. . q->enqueue
. . .
. sch_direct_xmit() .
. . qdisc_run_begin()
. . return and do nothing
. . .
clear __QDISC_STATE_SCHED . .
qdisc_run_begin() . .
return and do nothing . .
. . .
. qdisc_run_end() .

This patch fixes the above data race by:
1. Get the flag before doing spin_trylock().
2. If the first spin_trylock() return false and the flag is not
set before the first spin_trylock(), Set the flag and retry
another spin_trylock() in case other CPU may not see the new
flag after it releases the lock.
3. reschedule if the flags is set after the lock is released
at the end of qdisc_run_end().

For tx_action case, the flags is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled
again to dequeue the skb enqueued by cpu2.

Only clear the flag before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the above double
spin_trylock() and __netif_schedule() calling.

The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:

threads without+this_patch with+this_patch delta
1 2.61Mpps 2.60Mpps -0.3%
2 3.97Mpps 3.82Mpps -3.7%
4 5.62Mpps 5.59Mpps -0.5%
8 2.78Mpps 2.77Mpps -0.3%
16 2.22Mpps 2.22Mpps -0.0%

Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>

I have a setup which is able to reproduce the issue quite reliably:

In a Xen guest I'm mounting 8 NFS shares and run sysbench fileio on
each of them. The average latency reported by sysbench is well below
1 msec, but at least once per hour I get latencies in the minute
range.

With this patch I don't see these high latencies any longer (test
is running for more than 20 hours now).

So you can add my:

Tested-by: Juergen Gross <jgross@xxxxxxxx>

Hi, Juergen

Thanks for the testing.

With the simulated test case suggested by Michal, I still has some
potential issue to debug, hopefully will send out new version in
this week.

Also, is it possible to run your testcase any longer? I think "72 hours"
would be enough to testify that it fixes the problem completely:)

This should be possible, yes.

I'm using the setup to catch another bug which is showing up every few
days. I don't see a reason why I shouldn't be able to add your patch
and verify it in parallel.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature