Re: [PATCH v7 6/6] rcu/segcblist: Add additional comments to explain smp_mb()

From: joel
Date: Sat Oct 17 2020 - 20:48:53 EST


On Sat, Oct 17, 2020 at 03:29:54PM +0200, Frederic Weisbecker wrote:
> On Fri, Oct 16, 2020 at 11:19:41PM -0400, joel@xxxxxxxxxxxxxxxxx wrote:
> > On Fri, Oct 16, 2020 at 09:27:53PM -0400, joel@xxxxxxxxxxxxxxxxx wrote:
> > [..]
> > > > > + *
> > > > > + * Memory barrier is needed after adding to length for the case
> > > > > + * where length transitions from 0 -> 1. This is because rcu_barrier()
> > > > > + * should never miss an update to the length. So the update to length
> > > > > + * has to be seen *before* any modifications to the segmented list. Otherwise a
> > > > > + * race can happen.
> > > > > + * P0 (what P1 sees) P1
> > > > > + * queue to list
> > > > > + * rcu_barrier sees len as 0
> > > > > + * set len = 1.
> > > > > + * rcu_barrier does nothing.
> > > >
> > > > So that would be:
> > > >
> > > > call_rcu() rcu_barrier()
> > > > -- --
> > > > WRITE(len, len + 1) l = READ(len)
> > > > smp_mb() if (!l)
> > > > queue check next CPU...
> > > >
> > > >
> > > > But I still don't see against what it pairs in rcu_barrier.
> > >
> > > Actually, for the second case maybe a similar reasoning can be applied
> > > (control dependency) but I'm unable to come up with a litmus test.
> > > In fact, now I'm wondering how is it possible that call_rcu() races with
> > > rcu_barrier(). The module should ensure that no more call_rcu() should happen
> > > before rcu_barrier() is called.
> > >
> > > confused
> >
> > So I made a litmus test to show that smp_mb() is needed also after the update
> > to length. Basically, otherwise it is possible the callback will see garbage
> > that the module cleanup/unload did.
> >
> > C rcubarrier+ctrldep
> >
> > (*
> > * Result: Never
> > *
> > * This litmus test shows that rcu_barrier (P1) prematurely
> > * returning by reading len 0 can cause issues if P0 does
> > * NOT have a smb_mb() after WRITE_ONCE(len, 1).
> > * mod_data == 2 means module was unloaded (so data is garbage).
> > *)
> >
> > { int len = 0; int enq = 0; }
> >
> > P0(int *len, int *mod_data, int *enq)
> > {
> > int r0;
> >
> > WRITE_ONCE(*len, 1);
> > smp_mb(); /* Needed! */
> > WRITE_ONCE(*enq, 1);
> >
> > r0 = READ_ONCE(*mod_data);
> > }
> >
> > P1(int *len, int *mod_data, int *enq)
> > {
> > int r0;
> > int r1;
> >
> > r1 = READ_ONCE(*enq);
> >
> > // barrier Just for test purpose ("exists" clause) to force the..
> > // ..rcu_barrier() to see enq before len
> > smp_mb();
> > r0 = READ_ONCE(*len);
> >
> > // implicit memory barrier due to conditional */
> > if (r0 == 0)
> > WRITE_ONCE(*mod_data, 2);
> > }
>
> I'm not sure what scenario P1 refers to in practice, and to what module?

Kernel module usecase for rcu_barrier. See the docs. P1() in the litmus test
is just a thread of execution which I was using to show the memory accesses
of rcu_barrier.

> > // Did P0 read garbage?
> > exists (0:r0=2 /\ 1:r0=0 /\ 1:r1=1)
> >
>
>
> What also scares me is that in rcu_barrier():
>
> for_each_possible_cpu(cpu) {
> rdp = per_cpu_ptr(&rcu_data, cpu);
> if (cpu_is_offline(cpu) &&
> !rcu_segcblist_is_offloaded(&rdp->cblist))
> continue;
> if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
> rcu_barrier_trace(TPS("OnlineQ"), cpu,
> rcu_state.barrier_sequence);
> smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
> } else if (rcu_segcblist_n_cbs(&rdp->cblist) &&
> cpu_is_offline(cpu)) {
> rcu_barrier_trace(TPS("OfflineNoCBQ"), cpu,
> rcu_state.barrier_sequence);
> local_irq_disable();
> rcu_barrier_func((void *)cpu);
> local_irq_enable();
> } else if (cpu_is_offline(cpu)) {
> rcu_barrier_trace(TPS("OfflineNoCBNoQ"), cpu,
> rcu_state.barrier_sequence);
> } else {
> rcu_barrier_trace(TPS("OnlineNQ"), cpu,
> rcu_state.barrier_sequence);
> }
> }
>
> I can't find something that makes sure this isn't racy while reading
> rcu_segcblist_n_cbs(&rdp->cblist).
>
> I mean what I see sums up to this:
>
> CPU 0 CPU 1
> rcu_barrier() call_rcu()/rcu_segcblist_enqueue()
> ------------ --------
>
> smp_mb();
> inc_len();
> smp_mb();
> queue callback;
> for_each_possible_cpu(cpu)
> if (!rcu_segcblist_n_cbs(&rdp->cblist))
> continue;
>
> It looks possible for rcu_barrier() to believe there is no callback enqueued
> and see rcu_segcblist_n_cbs(&rdp->cblist) == 0 here.
>
> I'm very likely missing something obvious somewhere.
>
> CPU 0 CPU 1
> rcu_barrier() call_rcu()/rcu_segcblist_enqueue()
> ------------ --------
>
> smp_mb();
> inc_len();
> smp_mb();
> queue callback;
> for_each_possible_cpu(cpu)
> if (!rcu_segcblist_n_cbs(&rdp->cblist))
> continue;
>

> invoke_callback

If CPU 0 saw the enqueue of the callback (that is the CPU 1's writes to the
segcb_list propagated to CPU 0), then it would have also seen the
effects of the inc_len. I forced this case in my last litmus test by this
code in P1():

r1 = READ_ONCE(*enq);
smp_mb(); /* barrier Just for test purpose to show that the.. */
/* ..rcu_barrier() saw list modification */

On the other hand, if CPU 0 did not see the enqueue, then there is really no
issue. Since that is the same case where call_rcu() happened _after_ the
rcu_barrier() and there's no race. rcu_barrier() does not need to wait if
there was no callback enqueued.

This is not exactly the easiest thing to explain, hence the litmus.

- Joel