Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments

From: Paul E. McKenney
Date: Thu Dec 15 2022 - 20:09:26 EST


On Thu, Dec 15, 2022 at 05:13:47PM -0500, Joel Fernandes wrote:
> > On Dec 15, 2022, at 3:13 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > On Thu, Dec 15, 2022 at 05:58:14PM +0000, Joel Fernandes wrote:
> >>> On Thu, Dec 15, 2022 at 5:48 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>>> On Thu, Dec 15, 2022 at 5:08 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>
> >>>>> Scenario for the reader to increment the old idx once:
> >>>>>
> >>>>> _ Assume ssp->srcu_idx is initially 0.
> >>>>> _ The READER reads idx that is 0
> >>>>> _ The updater runs and flips the idx that is now 1
> >>>>> _ The reader resumes with 0 as an index but on the next srcu_read_lock()
> >>>>> it will see the new idx which is 1
> >>>>>
> >>>>> What could be the scenario for it to increment the old idx twice?
> >>>>
> >>>> Unless I am missing something, the reader must reference the
> >>>> srcu_unlock_count[old_idx] and then do smp_mb() before it will be
> >>>> absolutely guaranteed of seeing the new value of ->srcu_idx.
> >>>
> >>> I think both of you are right depending on how the flip raced with the
> >>> first reader's unlock in that specific task.
> >>>
> >>> If the first read section's srcu_read_unlock() and its corresponding
> >>> smp_mb() happened before the flip, then the increment of old idx
> >>> would happen only once. The next srcu_read_lock() will read the new
> >>> index. If the srcu_read_unlock() and it's corresponding smp_mb()
> >>> happened after the flip, the old_idx will be sampled again and can be
> >>> incremented twice. So it depends on how the flip races with
> >>> srcu_read_unlock().
> >>
> >> I am sorry this is inverted, but my statement's gist stands I believe:
> >>
> >> 1. Flip+smp_mb() happened before unlock's smp_mb() -- reader will not
> >> increment old_idx the second time.
> >
> > By "increment old_idx" you mean "increment ->srcu_lock_count[old_idx]",
> > correct?
>
> Yes sorry for confusing, i indeed meant lock count increment corresponding to the old index.

I guessed correctly!!! Don't worry, it won't happen again. ;-)

> > Again, the important ordering isn't the smp_mb(), but the accesses,
> > in this case, the accesses to ->srcu_unlock_count[idx].
>
> I was talking about ordering of the flip of index (write) with respect
> to both the reading of the old index in the rcu_read_lock() and its
> subsequent lock count increment corresponding to that index. I believe
> we are talking her about how this race can effect the wrap around issues
> when scanning for readers in the pre flip index, and we concluded that
> there can be at most 2 of these on the SAME task.

Agreed.

> The third time, reader
> will always see the new flipped index because of the memory barriers on
> both sides. IOW, the same task cannot overflow the lock counter on the
> preflipped index and cause issues. However there can be Nt different
> tasks so perhaps you can have 2*Nt number of preempted readers that had
> sampled the old index and now will do a lock and unlock on that old index,
> potentially causing a lock==unlock match when there should not be a match.

So each task can do one old-index ->srcu_lock_count[] increment, and Nc of
them can do a second one, where Nc is the number of CPUs. This is because
a given task's smp_mb() applies to all later code executed by that task
and also to code executed by other tasks running later on that same CPU.

> >> 2. unlock()'s smp_mb() happened before Flip+smp_mb() , now the reader
> >> has no new smp_mb() that happens AFTER the flip happened. So it can
> >> totally sample the old idx again -- that particular reader will
> >> increment twice, but the next time, it will see the flipped one.
> >
> > I will let you transliterate both. ;-)
>
> I think I see what you mean now :)
>
> I believe the access I am referring to is the read of idx on one side and the write to idx on the other. However that is incomplete and I need to pair that with some of other access on both sides.
>
> So perhaps this:
>
> Writer does flip + smp_mb + read unlock counts [1]
>
> Reader does:
> read idx + smp_mb() + increment lock counts [2]
>
> And subsequently reader does
> Smp_mb() + increment unlock count. [3]
>
> So [1] races with either [2] or [2]+[3].
>
> Is that fair?

That does look much better, thank you!

> >> Did I get that right? Thanks.
> >
> > So why am I unhappy with orderings of smp_mb()?
> >
> > To see this, let's take the usual store-buffering litmus test:
> >
> > CPU 0 CPU 1
> > WRITE_ONCE(x, 1); WRITE_ONCE(y, 1);
> > smp_mb(); smp_mb();
> > r0 = READ_ONCE(y); r1 = READ_ONCE(x);
> >
> > Suppose CPU 0's smp_mb() happens before that of CPU 1:
> >
> > CPU 0 CPU 1
> > WRITE_ONCE(x, 1); WRITE_ONCE(y, 1);
> > smp_mb();
> > smp_mb();
> > r0 = READ_ONCE(y); r1 = READ_ONCE(x);
> >
> > We get r0 == r1 == 1.
> >
> > Compare this to CPU 1's smp_mb() happening before that of CPU 0:
> >
> > CPU 0 CPU 1
> > WRITE_ONCE(x, 1); WRITE_ONCE(y, 1);
> > smp_mb();
> > smp_mb();
> > r0 = READ_ONCE(y); r1 = READ_ONCE(x);
> >
> > We still get r0 == r1 == 1. Reversing the order of the two smp_mb()
> > calls changed nothing.
> >
> > But, if we order CPU 1's write to follow CPU 0's read, then we have
> > this:
> >
> > CPU 0 CPU 1
> > WRITE_ONCE(x, 1);
> > smp_mb();
> > r0 = READ_ONCE(y);
> > WRITE_ONCE(y, 1);
> > smp_mb();
> > r1 = READ_ONCE(x);
> >
> > Here, given that r0 had the final value of zero, we know that
> > r1 must have a final value of 1.
> >
> > And suppose we reverse this:
> >
> > CPU 0 CPU 1
> > WRITE_ONCE(y, 1);
> > smp_mb();
> > r1 = READ_ONCE(x);
> > WRITE_ONCE(x, 1);
> > smp_mb();
> > r0 = READ_ONCE(y);
> >
> > Now there is a software-visible difference in behavior. The value of
> > r0 is now 1 instead of zero and the value of r1 is now 0 instead of 1.
> >
> > Does this make sense?
>
> Yes I see what you mean. In first case, smp_mb() ordering didn’t matter. But in the second case it does.

Yes, there have to be accesses for the software to even see the effects
of a given smp_mb().

Thanx, Paul