Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

From: Jakub Kicinski
Date: Sat May 29 2021 - 00:33:22 EST

Next message: kernel test robot: "drivers/usb/usbip/stub_tx.c:417:5: warning: stack frame size of 2064 bytes in function 'stub_tx_loop'"
Previous message: kernel test robot: "Re: [PATCH 10/19] drivers/hv: get and set vcpu registers ioctls"
In reply to: Yunsheng Lin: "Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc"
Next in thread: Yunsheng Lin: "Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 29 May 2021 09:44:57 +0800 Yunsheng Lin wrote:
> >> @@ -3852,10 +3852,32 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
> >> qdisc_calculate_pkt_len(skb, q);
> >>
> >> if (q->flags & TCQ_F_NOLOCK) {
> >> + if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
> >> + qdisc_run_begin(q)) {
> >> + /* Retest nolock_qdisc_is_empty() within the protection
> >> + * of q->seqlock to ensure qdisc is indeed empty.
> >> + */
> >> + if (unlikely(!nolock_qdisc_is_empty(q))) {
> >
> > This is just for the DRAINING case right?
> >
> > MISSED can be set at any moment, testing MISSED seems confusing.
>
> MISSED is only set when there is lock contention, which means it
> is better not to do the qdisc bypass to avoid out of order packet
> problem,

Avoid as in make less likely? Nothing guarantees other thread is not
interrupted after ->enqueue and before qdisc_run_begin().

TBH I'm not sure what out-of-order situation you're referring to,
there is no ordering guarantee between separate threads trying to
transmit AFAIU.

IOW this check is not required for correctness, right?

> another good thing is that we could also do the batch
> dequeuing and transmiting of packets when there is lock contention.

No doubt, but did you see the flag get set significantly often here
to warrant the double-checking?

> > Is it really worth the extra code?
>
> Currently DRAINING is only set for the netdev queue stopped.
> We could only use DRAINING to indicate the non-empty of a qdisc,
> then we need to set the DRAINING evrywhere MISSED is set, that is
> why I use both DRAINING and MISSED to indicate a non-empty qdisc.
>
> >
> >> + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
> >> + __qdisc_run(q);
> >> + qdisc_run_end(q);
> >> +
> >> + goto no_lock_out;
> >> + }
> >> +
> >> + qdisc_bstats_cpu_update(q, skb);
> >> + if (sch_direct_xmit(skb, q, dev, txq, NULL, true) &&
> >> + !nolock_qdisc_is_empty(q))
> >> + __qdisc_run(q);
> >> +
> >> + qdisc_run_end(q);
> >> + return NET_XMIT_SUCCESS;
> >> + }
> >> +
> >> rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
> >> - if (likely(!netif_xmit_frozen_or_stopped(txq)))
> >> - qdisc_run(q);
> >> + qdisc_run(q);
> >>
> >> +no_lock_out:
> >> if (unlikely(to_free))
> >> kfree_skb_list(to_free);
> >> return rc;

> >> @@ -164,9 +166,13 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> >>
> >> skb = next;
> >> }
> >> - if (lock)
> >> +
> >> + if (lock) {
> >> spin_unlock(lock);
> >> - __netif_schedule(q);
> >> + set_bit(__QDISC_STATE_MISSED, &q->state);
> >> + } else {
> >> + __netif_schedule(q);
> >
> > Could we reorder qdisc_run_begin() with clear_bit(SCHED)
> > in net_tx_action() and add SCHED to the NON_EMPTY mask?
>
> Did you mean clearing the SCHED after the q->seqlock is
> taken?
>
> The problem is that the SCHED is also used to indicate
> a qdisc is in sd->output_queue or not, and the
> qdisc_run_begin() called by net_tx_action() can not
> guarantee it will take the q->seqlock(we are using trylock
> for lockless qdisc)

Ah, right. We'd need to do some more flag juggling in net_tx_action()
to get it right.

> >> + }
> >> }
> >>
> >> static void try_bulk_dequeue_skb(struct Qdisc *q,
> >> @@ -409,7 +415,11 @@ void __qdisc_run(struct Qdisc *q)
> >> while (qdisc_restart(q, &packets)) {
> >> quota -= packets;
> >> if (quota <= 0) {
> >> - __netif_schedule(q);
> >> + if (q->flags & TCQ_F_NOLOCK)
> >> + set_bit(__QDISC_STATE_MISSED, &q->state);
> >> + else
> >> + __netif_schedule(q);
> >> +
> >> break;
> >> }
> >> }
> >> @@ -680,13 +690,14 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
> >> if (likely(skb)) {
> >> qdisc_update_stats_at_dequeue(qdisc, skb);
> >> } else if (need_retry &&
> >> - test_bit(__QDISC_STATE_MISSED, &qdisc->state)) {
> >> + READ_ONCE(qdisc->state) & QDISC_STATE_NON_EMPTY) {
> >
> > Do we really need to retry based on DRAINING being set?
> > Or is it just a convenient way of coding things up?
>
> Yes, it is just a convenient way of coding things up.
> Only MISSED need retrying.

Next message: kernel test robot: "drivers/usb/usbip/stub_tx.c:417:5: warning: stack frame size of 2064 bytes in function 'stub_tx_loop'"
Previous message: kernel test robot: "Re: [PATCH 10/19] drivers/hv: get and set vcpu registers ioctls"
In reply to: Yunsheng Lin: "Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc"
Next in thread: Yunsheng Lin: "Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]