Re: [PATCH v4 -rcu 1/4] rcu/segcblist: Do not depend on rcl->len to store the segcb len during merge
From: Paul E. McKenney
Date:  Fri Aug 28 2020 - 10:19:00 EST
On Thu, Aug 27, 2020 at 06:55:18PM -0400, Joel Fernandes wrote:
> On Wed, Aug 26, 2020 at 07:20:28AM -0700, Paul E. McKenney wrote:
> [...]
> > > > Or better yet, please see below, which should allow getting rid of both
> > > > of them.
> > > > 
> > > > >  	rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs);
> > > > >  	rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs);
> > > > > -	rcu_segcblist_insert_count(dst_rsclp, &donecbs);
> > > > > +
> > > > > +	rcu_segcblist_add_len(dst_rsclp, src_len);
> > > > >  	rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs);
> > > > >  	rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs);
> > > > 
> > > > Rather than adding the blank lines, why not have the rcu_cblist structures
> > > > carry the lengths?  You are already adjusting one of the two call sites
> > > > that care (rcu_do_batch()), and the other is srcu_invoke_callbacks().
> > > > That should shorten this function a bit more.  And make callback handling
> > > > much more approachable, I suspect.
> > > 
> > > Sorry, I did not understand. The rcu_cblist structure already has a length
> > > field. I do modify rcu_segcblist_extract_done_cbs() and
> > > rcu_segcblist_extract_pend_cbs() to carry the length already, in a later
> > > patch.
> > > 
> > > Just to emphasize, this patch is just a small refactor to avoid an issue in
> > > later patches. It aims to keep current functionality unchanged.
> > 
> > True enough.  I am just suggesting that an equally small refactor in
> > a slightly different direction should get to a better place.  The key
> > point enabling this slightly different direction is that this code is
> > an exception to the "preserve ->cblist.len" rule because it is invoked
> > only from the CPU hotplug code.
> > 
> > So you could use the rcu_cblist .len field to update the ->cblist.len
> > field, thus combining the _cbs and _count updates.  One thing that helps
> > is that setting th e rcu_cblist .len field doesn't hurt the other use
> > cases that require careful handling of ->cblist.len.
> 
> Thank you for the ideas. I am trying something like this on top of this
> series based on the ideas. One thing I concerned a bit is if getting rid of
> the rcu_segcblist_xchg_len() function (which has memory barriers in them)
> causes issues in the hotplug path. I am now directly updating the length
> without additional memory barriers. I will test it more and try to reason
> more about it as well.
In this particular case, the CPU-hotplug locks prevent rcu_barrier()
from running concurrently, so it should be OK.  Is there an easy way
to make lockdep help us check this?  Does lockdep_assert_cpus_held()
suffice, or is it too easily satisfied?
> ---8<-----------------------
> 
> From: Joel Fernandes <joelaf@xxxxxxxxxx>
> Date: Thu, 27 Aug 2020 18:30:25 -0400
> Subject: [PATCH] fixup! rcu/segcblist: Do not depend on donecbs ->len to store
>  the segcb len during merge
> 
> Signed-off-by: Joel Fernandes <joelaf@xxxxxxxxxx>
> ---
>  kernel/rcu/rcu_segcblist.c | 38 ++++----------------------------------
>  1 file changed, 4 insertions(+), 34 deletions(-)
> 
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 79c2cbe388c5..c33abbc97a07 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -175,26 +175,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp)
>  	rcu_segcblist_add_len(rsclp, 1);
>  }
>  
> -/*
> - * Exchange the numeric length of the specified rcu_segcblist structure
> - * with the specified value.  This can cause the ->len field to disagree
> - * with the actual number of callbacks on the structure.  This exchange is
> - * fully ordered with respect to the callers accesses both before and after.
> - */
> -static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v)
> -{
> -#ifdef CONFIG_RCU_NOCB_CPU
> -	return atomic_long_xchg(&rsclp->len, v);
> -#else
> -	long ret = rsclp->len;
> -
> -	smp_mb(); /* Up to the caller! */
> -	WRITE_ONCE(rsclp->len, v);
> -	smp_mb(); /* Up to the caller! */
> -	return ret;
> -#endif
> -}
> -
This looks nice!
>  /*
>   * Initialize an rcu_segcblist structure.
>   */
> @@ -361,6 +341,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
>  		if (rsclp->tails[i] == rsclp->tails[RCU_DONE_TAIL])
>  			WRITE_ONCE(rsclp->tails[i], &rsclp->head);
>  	rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0);
> +	rcu_segcblist_add_len(rsclp, -(rclp->len));
>  }
>  
>  /*
> @@ -414,17 +395,7 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp,
>  		WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]);
>  		rcu_segcblist_set_seglen(rsclp, i, 0);
>  	}
> -}
> -
> -/*
> - * Insert counts from the specified rcu_cblist structure in the
> - * specified rcu_segcblist structure.
> - */
> -void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp,
> -				struct rcu_cblist *rclp)
> -{
> -	rcu_segcblist_add_len(rsclp, rclp->len);
> -	rclp->len = 0;
> +	rcu_segcblist_add_len(rsclp, -(rclp->len));
As does this.  ;-)
>  }
>  
>  /*
> @@ -448,6 +419,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp,
>  			break;
>  	rclp->head = NULL;
>  	rclp->tail = &rclp->head;
> +	rcu_segcblist_add_len(rsclp, rclp->len);
Does there need to be a compensating action in rcu_do_batch(), or is
this the point of the "rcu_segcblist_add_len(rsclp, -(rclp->len));"
added to rcu_segcblist_extract_done_cbs() above?
If so, a comment would be good.
>  }
>  
>  /*
> @@ -463,6 +435,7 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
>  	rcu_segcblist_add_seglen(rsclp, RCU_NEXT_TAIL, rclp->len);
>  	WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
>  	WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
> +	rcu_segcblist_add_len(rsclp, rclp->len);
>  }
>  
>  /*
> @@ -601,16 +574,13 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
>  {
>  	struct rcu_cblist donecbs;
>  	struct rcu_cblist pendcbs;
> -	long src_len;
>  
>  	rcu_cblist_init(&donecbs);
>  	rcu_cblist_init(&pendcbs);
>  
> -	src_len = rcu_segcblist_xchg_len(src_rsclp, 0);
>  	rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs);
>  	rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs);
>  
> -	rcu_segcblist_add_len(dst_rsclp, src_len);
>  	rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs);
>  	rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs);
Can we now pair the corresponding _extract_ and _insert_ calls, thus
requiring only one rcu_cblist structure?  This would drop two more lines
of code.  Or would that break something?
							Thanx, Paul