Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments

From: Joel Fernandes
Date: Wed Dec 14 2022 - 20:35:02 EST


On Wed, Dec 14, 2022 at 7:04 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Wed, Dec 14, 2022 at 11:14:48PM +0000, Joel Fernandes wrote:
> > On Wed, Dec 14, 2022 at 11:10 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Dec 14, 2022 at 11:07 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Dec 14, 2022 at 9:24 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > > > > I also did not get why you care about readers that come and ago (you
> > > > > > mentioned the first reader seeing incorrect idx and the second reader
> > > > > > seeing the right flipped one, etc). Those readers are irrelevant
> > > > > > AFAICS since they came and went, and need not be waited on , right?.
> > > > >
> > > > > The comment is attempting to show (among other things) that we don't
> > > > > need to care about readers that come and go more than twice during that
> > > > > critical interval of time during the counter scans.
> > > >
> > > > Why do we need to care about readers that come and go even once? Once
> > > > they are gone, they have already done an unlock() and their RSCS is
> > > > over, so they need to be considered AFAICS.
> > > >
> > >
> > > Aargh, I meant: "so they need to be considered AFAICS".
> >
> > Trying again: "so they need not be considered AFAICS".
>
> Give or take counter wrap, which can make it appear that still-present
> readers have finished.

Ah you mean those flood of readers affect the counter wrapping and not
that those readers have to be waited on or anything, they just happen
to have a side-effect on *existing readers* which need to be waited
on.

Thanks a lot for this explanation, this part I agree. Readers that
sampled the idx before the flip happened, and then did their
lock+unlock counter increments both after the flip, and after the
second unlock counter scan (second scan), can mess up the lock
counters such that the second scan found lock==unlock, even though it
is not to be due to pre-existing readers. But as you pointed out,
there have to be a substantially large number of these to cause the
equality check to match. This might be another reason why it is
important to scan the unlocks first, because the locks are what have
to cause the wrap around of the lock counter. Instead if you counted
locks first, then the unlocks would have to do the catching up to the
locks which are much fewer than a full wrap around.

I still don't see why this affects only the first reader. There could
be more than 1 reader that needs to be waited on (the readers that
started before the grace period started). Say there are 5 of them.
When the grace period starts, the interfering readers (2^32 of them or
so) could have sampled the old idx before the flip, and then do
lock+unlock (on that old pre-flip() idx) in quick succession after the
smp_mb() in the second srcu_readers_active_idx_check(). That causes
those 5 poor readers to not be waited on. Granted, any new readers
after this thundering herd should see the new idx and will not be
affected, thanks to the memory barriers.

Still confused, but hey I'll take it little at a time ;-) Also thanks
for the suggestions for litmus tests.

Cheers,

- Joel

> > Anyway, my 1 year old son is sick so signing off for now. Thanks.
>
> Ouch! I hope he recovers quickly and completely!!!
>
> Thanx, Paul