Re: [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()

From: Jakub Kicinski
Date: Mon Oct 25 2021 - 15:59:28 EST


On Fri, 22 Oct 2021 11:17:46 -0500 Seth Forshee wrote:
> From: Seth Forshee <sforshee@xxxxxxxxxxxxxxxx>
>
> Currently rcu_barrier() is used to ensure that no readers of the
> inactive mini_Qdisc buffer remain before it is reused. This waits for
> any pending RCU callbacks to complete, when all that is actually
> required is to wait for one RCU grace period to elapse after the buffer
> was made inactive. This means that using rcu_barrier() may result in
> unnecessary waits.
>
> To improve this, store the current RCU state when a buffer is made
> inactive and use poll_state_synchronize_rcu() to check whether a full
> grace period has elapsed before reusing it. If a full grace period has
> not elapsed, wait for a grace period to elapse, and in the non-RT case
> use synchronize_rcu_expedited() to hasten it.
>
> Since this approach eliminates the RCU callback it is no longer
> necessary to synchronize_rcu() in the tp_head==NULL case. However, the
> RCU state should still be saved for the previously active buffer.
>
> Before this change I would typically see mini_qdisc_pair_swap() take
> tens of milliseconds to complete. After this change it typcially
> finishes in less than 1 ms, and often it takes just a few microseconds.
>
> Thanks to Paul for walking me through the options for improving this.
>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Signed-off-by: Seth Forshee <sforshee@xxxxxxxxxxxxxxxx>

LGTM, but please rebase and retest on top of latest net-next.

> void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> struct tcf_proto *tp_head)
> {
> @@ -1423,28 +1419,30 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>
> if (!tp_head) {
> RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> - /* Wait for flying RCU callback before it is freed. */
> - rcu_barrier();
> - return;
> - }
> + } else {
> + miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> + &miniqp->miniq1 : &miniqp->miniq2;
>
> - miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> - &miniqp->miniq1 : &miniqp->miniq2;

nit: any reason this doesn't read:

miniq = miniq_old != &miniqp->miniq1 ?
&miniqp->miniq1 : &miniqp->miniq2;

Surely it's not equal to miniq1 or miniq2 if it's NULL.

> + /* We need to make sure that readers won't see the miniq
> + * we are about to modify. So ensure that at least one RCU
> + * grace period has elapsed since the miniq was made
> + * inactive.
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> + cond_synchronize_rcu(miniq->rcu_state);
> + else if (!poll_state_synchronize_rcu(miniq->rcu_state))
> + synchronize_rcu_expedited();
>
> - /* We need to make sure that readers won't see the miniq
> - * we are about to modify. So wait until previous call_rcu callback
> - * is done.
> - */
> - rcu_barrier();
> - miniq->filter_list = tp_head;
> - rcu_assign_pointer(*miniqp->p_miniq, miniq);
> + miniq->filter_list = tp_head;
> + rcu_assign_pointer(*miniqp->p_miniq, miniq);
> + }
>
> if (miniq_old)
> - /* This is counterpart of the rcu barriers above. We need to
> + /* This is counterpart of the rcu sync above. We need to
> * block potential new user of miniq_old until all readers
> * are not seeing it.
> */
> - call_rcu(&miniq_old->rcu, mini_qdisc_rcu_func);
> + miniq_old->rcu_state = start_poll_synchronize_rcu();
> }
> EXPORT_SYMBOL(mini_qdisc_pair_swap);
>
> @@ -1463,6 +1461,8 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
> miniqp->miniq1.cpu_qstats = qdisc->cpu_qstats;
> miniqp->miniq2.cpu_bstats = qdisc->cpu_bstats;
> miniqp->miniq2.cpu_qstats = qdisc->cpu_qstats;
> + miniqp->miniq1.rcu_state = get_state_synchronize_rcu();
> + miniqp->miniq2.rcu_state = miniqp->miniq1.rcu_state;
> miniqp->p_miniq = p_miniq;
> }
> EXPORT_SYMBOL(mini_qdisc_pair_init);