Re: [PATCH v7 6/6] rcu/segcblist: Add additional comments to explain smp_mb()
From: joel
Date: Sat Oct 17 2020 - 02:03:23 EST
Adding Alan as well as its memory barrier discussion ;-)
On Thu, Oct 15, 2020 at 03:35:11PM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 14, 2020 at 08:23:01PM -0400, Joel Fernandes (Google) wrote:
> > Memory barriers are needed when updating the full length of the
> > segcblist, however it is not fully clearly why one is needed before and
> > after. This patch therefore adds additional comments to the function
> > header to explain it.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > ---
> > kernel/rcu/rcu_segcblist.c | 38 ++++++++++++++++++++++++++++++++++----
> > 1 file changed, 34 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index 271d5d9d7f60..25ffd07f9951 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -147,17 +147,47 @@ static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg)
> > * field to disagree with the actual number of callbacks on the structure.
> > * This increase is fully ordered with respect to the callers accesses
> > * both before and after.
> > + *
> > + * About memory barriers:
> > + * There is a situation where rcu_barrier() locklessly samples the full
> > + * length of the segmented cblist before deciding what to do. That can
> > + * race with another path that calls this function. rcu_barrier() should
> > + * not wrongly assume there are no callbacks, so any transitions from 1->0
> > + * and 0->1 have to be carefully ordered with respect to list modifications.
> > + *
> > + * Memory barrier is needed before adding to length, for the case where
> > + * v is negative which does not happen in current code, but used
> > + * to happen. Keep the memory barrier for robustness reasons.
>
> Heh, I seem to recongnize someone's decision's style ;-)
Actually, the last paragraph I added is bogus. Indeed this memory barrier is
not just for robustness reasons. It is needed because rcu_do_batch() adjusts
the length of the list (possibly to 0) _after_ executing the callbacks, so
that's a negative number:
rcu_segcblist_add_len(&rdp->cblist, -count);
> > When/If the
> > + * length transitions from 1 -> 0, the write to 0 has to be ordered *after*
> > + * the memory accesses of the CBs that were dequeued and the segcblist
> > + * modifications:
> > + * P0 (what P1 sees) P1
> > + * set len = 0
> > + * rcu_barrier sees len as 0
> > + * dequeue from list
> > + * rcu_barrier does nothing.
>
> It's a bit difficult to read that way. So that would be:
>
>
> rcu_do_batch() rcu_barrier()
> -- --
> dequeue l = READ(len)
> smp_mb() if (!l)
> WRITE(len, 0) check next CPU...
>
> But I'm a bit confused against what it pairs in rcu_barrier().
I believe it pairs with an implied memory barrier via control dependency.
The following litmus test would confirm it:
C rcubarrier+ctrldep
(*
* Result: Never
*
* This litmus test shows that rcu_barrier (P1) prematurely
* returning by reading len 0 can cause issues if P0 does
* NOT have a smb_mb() before WRITE_ONCE().
*
* mod_data == 2 means garbage which the callback should never see.
*)
{ int len = 1; }
P0(int *len, int *mod_data)
{
int r0;
// accessed by say RCU callback in rcu_do_batch();
r0 = READ_ONCE(*mod_data);
smp_mb(); // Remove this and the "exists" will become true.
WRITE_ONCE(*len, 0);
}
P1(int *len, int *mod_data)
{
int r0;
r0 = READ_ONCE(*len);
// rcu_barrier will return early if len is 0
if (r0 == 0)
WRITE_ONCE(*mod_data, 2);
}
// Is it possible?
exists (0:r0=2 /\ 1:r0=0)
> > + *
> > + * Memory barrier is needed after adding to length for the case
> > + * where length transitions from 0 -> 1. This is because rcu_barrier()
> > + * should never miss an update to the length. So the update to length
> > + * has to be seen *before* any modifications to the segmented list. Otherwise a
> > + * race can happen.
> > + * P0 (what P1 sees) P1
> > + * queue to list
> > + * rcu_barrier sees len as 0
> > + * set len = 1.
> > + * rcu_barrier does nothing.
>
> So that would be:
>
> call_rcu() rcu_barrier()
> -- --
> WRITE(len, len + 1) l = READ(len)
> smp_mb() if (!l)
> queue check next CPU...
>
>
> But I still don't see against what it pairs in rcu_barrier.
Actually, for the second case maybe a similar reasoning can be applied
(control dependency) but I'm unable to come up with a litmus test.
In fact, now I'm wondering how is it possible that call_rcu() races with
rcu_barrier(). The module should ensure that no more call_rcu() should happen
before rcu_barrier() is called.
confused,
- Joel