Re: [PATCH RFC] srcu: Yet more detail for srcu_readers_active_idx_check() comments

From: Paul E. McKenney
Date: Wed Dec 14 2022 - 20:39:48 EST


On Wed, Dec 14, 2022 at 08:34:03PM -0500, Joel Fernandes wrote:
> On Wed, Dec 14, 2022 at 7:04 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Wed, Dec 14, 2022 at 11:14:48PM +0000, Joel Fernandes wrote:
> > > On Wed, Dec 14, 2022 at 11:10 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Dec 14, 2022 at 11:07 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Dec 14, 2022 at 9:24 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > > > > > I also did not get why you care about readers that come and ago (you
> > > > > > > mentioned the first reader seeing incorrect idx and the second reader
> > > > > > > seeing the right flipped one, etc). Those readers are irrelevant
> > > > > > > AFAICS since they came and went, and need not be waited on , right?.
> > > > > >
> > > > > > The comment is attempting to show (among other things) that we don't
> > > > > > need to care about readers that come and go more than twice during that
> > > > > > critical interval of time during the counter scans.
> > > > >
> > > > > Why do we need to care about readers that come and go even once? Once
> > > > > they are gone, they have already done an unlock() and their RSCS is
> > > > > over, so they need to be considered AFAICS.
> > > > >
> > > >
> > > > Aargh, I meant: "so they need to be considered AFAICS".
> > >
> > > Trying again: "so they need not be considered AFAICS".
> >
> > Give or take counter wrap, which can make it appear that still-present
> > readers have finished.
>
> Ah you mean those flood of readers affect the counter wrapping and not
> that those readers have to be waited on or anything, they just happen
> to have a side-effect on *existing readers* which need to be waited
> on.

Exactly, that flood of readers could potentially result in a
counter-wrap-induced too-short SRCU grace period, and it is SRCU's job
to avoid that, and specifically the job of the code that this comment
lives in.

> Thanks a lot for this explanation, this part I agree. Readers that
> sampled the idx before the flip happened, and then did their
> lock+unlock counter increments both after the flip, and after the
> second unlock counter scan (second scan), can mess up the lock
> counters such that the second scan found lock==unlock, even though it
> is not to be due to pre-existing readers. But as you pointed out,
> there have to be a substantially large number of these to cause the
> equality check to match. This might be another reason why it is
> important to scan the unlocks first, because the locks are what have
> to cause the wrap around of the lock counter. Instead if you counted
> locks first, then the unlocks would have to do the catching up to the
> locks which are much fewer than a full wrap around.

True enough!

> I still don't see why this affects only the first reader. There could
> be more than 1 reader that needs to be waited on (the readers that
> started before the grace period started). Say there are 5 of them.
> When the grace period starts, the interfering readers (2^32 of them or
> so) could have sampled the old idx before the flip, and then do
> lock+unlock (on that old pre-flip() idx) in quick succession after the
> smp_mb() in the second srcu_readers_active_idx_check(). That causes
> those 5 poor readers to not be waited on. Granted, any new readers
> after this thundering herd should see the new idx and will not be
> affected, thanks to the memory barriers.

Yes, there could be quite a few such readers, but it only takes one
messed-up reader to constitute a bug in SRCU. ;-)

> Still confused, but hey I'll take it little at a time ;-) Also thanks
> for the suggestions for litmus tests.

Agreed, setting this sort of thing aside for a bit and then coming back
to it can be helpful.

Thanx, Paul

> Cheers,
>
> - Joel
>
> > > Anyway, my 1 year old son is sick so signing off for now. Thanks.
> >
> > Ouch! I hope he recovers quickly and completely!!!
> >
> > Thanx, Paul