Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

From: Jakub Kicinski
Date: Sun May 30 2021 - 16:21:45 EST


On Sun, 30 May 2021 09:37:09 +0800 Yunsheng Lin wrote:
> On 2021/5/30 2:49, Jakub Kicinski wrote:
> > The fact that MISSED is only cleared under q->seqlock does not matter,
> > because setting it and ->enqueue() are not under any lock. If the thread
> > gets interrupted between:
> >
> > if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
> > qdisc_run_begin(q)) {
> >
> > and ->enqueue() we can't guarantee that something else won't come in,
> > take q->seqlock and clear MISSED.
> >
> > thread1 thread2 thread3
> > # holds seqlock
> > qdisc_run_begin(q)
> > set(MISSED)
> > pfifo_fast_dequeue
> > clear(MISSED)
> > # recheck the queue
> > qdisc_run_end()
> > ->enqueue()
> > q->flags & TCQ_F_CAN_BYPASS..
> > qdisc_run_begin() # true
> > sch_direct_xmit()
> > qdisc_run_begin()
> > set(MISSED)
> >
> > Or am I missing something?
> >
> > Re-checking nolock_qdisc_is_empty() may or may not help.
> > But it doesn't really matter because there is no ordering
> > requirement between thread2 and thread3 here.
>
> I were more focued on explaining that using MISSED is reliable
> as sch_may_need_requeuing() checking in RFCv3 [1] to indicate a
> empty qdisc, and forgot to mention the data race described in
> RFCv3, which is kind of like the one described above:
>
> "There is a data race as below:
>
> CPU1 CPU2
> qdisc_run_begin(q) .
> . q->enqueue()
> sch_may_need_requeuing() .
> return true .
> . .
> . .
> q->enqueue() .
>
> When above happen, the skb enqueued by CPU1 is dequeued after the
> skb enqueued by CPU2 because sch_may_need_requeuing() return true.
> If there is not qdisc bypass, the CPU1 has better chance to queue
> the skb quicker than CPU2.
>
> This patch does not take care of the above data race, because I
> view this as similar as below:
>
> Even at the same time CPU1 and CPU2 write the skb to two socket
> which both heading to the same qdisc, there is no guarantee that
> which skb will hit the qdisc first, becuase there is a lot of
> factor like interrupt/softirq/cache miss/scheduling afffecting
> that."
>
> Does above make sense? Or any idea to avoid it?
>
> 1. https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-send-email-linyunsheng@xxxxxxxxxx/

We agree on this one.

Could you draw a sequence diagram of different CPUs (like the one
above) for the case where removing re-checking nolock_qdisc_is_empty()
under q->seqlock leads to incorrect behavior?

If there is no such case would you be willing to repeat the benchmark
with and without this test?

Sorry for dragging the review out..