Re: [PATCH RFC tip/core/rcu] SRCU rewrite

From: Paul E. McKenney
Date: Thu Nov 17 2016 - 16:23:02 EST


On Thu, Nov 17, 2016 at 03:38:08PM +0000, Mathieu Desnoyers wrote:
> ----- On Nov 17, 2016, at 10:31 AM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote:
>
> > ----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@xxxxxxxxx wrote:
> >
> >> On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> >>> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote:
> >>>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney
> >>>> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >>>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote:
> >>>>
> >>>> >>
> >>>> >> __srcu_read_lock() used to be called with preemption disabled. I guess
> >>>> >> the reason was because we have two percpu variables to increase. So with
> >>>> >> only one percpu right, could we remove the preempt_{dis,en}able() in
> >>>> >> srcu_read_lock() and use this_cpu_inc() here?
> >>>> >
> >>>> > Quite possibly...
> >>>> >
> >>>>
> >>>
> >>> Hello, Lai ;-)
> >>>
> >>>> it will be nicer if it is removed.
> >>>>
> >>>> The reason for the preemption-disabled was also because we
> >>>> have to disallow any preemption between the fetching of the idx
> >>>> and the increasement. so that we have at most NR_CPUS worth
> >>>> of readers using the old index that haven't incremented the counters.
> >>>>
> >>>
> >>> After reading the comment for a while, I actually got a question, maybe
> >>> I miss something ;-)
> >>>
> >>> Why "at most NR_CPUS worth of readers using the old index haven't
> >>> incremented the counters" could save us from overflow the counter?
> >>>
> >>> Please consider the following case in current implementation:
> >>>
> >>>
> >>> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...)
> >>>
> >>> one thread A is currently in __srcu_read_lock() and using idx = 1 and
> >>> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s
> >>> have been called and returned with idx = 1, please note I think this is
> >>> possible because I assume we may have some code like this:
> >>>
> >>> unsigned long i = 0;
> >>> for (; i < ULONG_MAX; i++)
> >>> srcu_read_lock(); // return the same idx 1;
> >>
> >> this is the wrong usage of the api.
> >>
> >>
> >> you might rewrite it as:
> >>
> >> unsigned long index[2] = {0, 0};
> >> unsigned long i = 0;
> >> for (; index[1] < ULONG_MAX; i++)
> >> index[srcu_read_lock()]++;
> >>
> >>
> >> I think we should add document to disallow this kind of usage.
> >> a reader should eat 4bytes on the memory at least.
> >>
> >
> > (the analysis below refers to the rewritten SRCU scheme)
> >
> > Let's presume we use the API correctly, as you describe (saving
> > the returned index of srcu_read_lock() somewhere).
> >
> > So for the sake of argument, we can either call srcu_read_lock
> > in a loop (during which we can be migrated), or call it
> > concurrently from various threads. The effect in terms of
> > overflow is pretty much the same.
> >
> > What is done here is incrementing per-cpu split-counters. In
> > the worse-case scenario, let's assume we're incrementing those
> > counters for a single index (0 or 1).
> >
> > If we think about this system-wide, we don't really care about
> > the overflow of a single CPU counter, because what matters is
> > the difference between the overall nr_lock - nr_unlock counts
> > for a given index, once summed up by synchronize_srcu().
> >
> > So the only situation that could lead to an overflow that matters
> > is if synchronize_srcu() see ULONG_MAX more increments of nr_lock
> > than the observed number of nr_unlock increments.
> >
> > So the bound is not only about the number of concurrent SRCU
> > readers, but also about the number of SRCU readers that may
> > appear between the moment synchronize_srcu() reads the nr_unlock
> > per-cpu counters and the moment it reads the nr_lock counters.
> >
> > This maximum bound of ULONG_MAX - 1 therefore applies to the
> > sum of:
> > - numner of concurrent SRCU read-side critical sections active
> > at the same time,
> > - number of SRCU read-side critical sections beginning after
> > synchronize_srcu() has read the nr_unlock counters, before
> > it reads the nr_lock counters.
>
> Now that I think of it, since we flip the period before summing
> the nr_unlock counter, we cannot have any newcoming readers appearing
> within the target period while we execute synchronize_srcu().
> So it ends up being a limit on the number of concurrent SRCU
> read-side c.s. active at the same time. (you can scratch the
> second bullet above).

We can have NR_CPUS worth of them -- those that have fetched the
index, but not yet incremented their counter.

But if the updater fails to see their counter increment, then
their next srcu_read_lock() is guaranteed to see the new index.

Thanx, Paul

> Thanks,
>
> Mathieu
>
>
>
> > You guys seem to see cases that would require a lower max nr
> > reader bound, but I'm afraid I don't quite understand them.
> >
> > Thanks,
> >
> > Mathieu
> >
> >
> >>>
> >>> And none of the corresponding srcu_read_unlock() has been called;
> >>>
> >>> In this case, at the time thread A increases the percpu c[idx], that
> >>> will result in an overflow, right? So even one reader using old idx will
> >>> result in overflow.
> >>>
> >>>
> >>> I think we won't be hit by overflow is not because we have few readers
> >>> using old idx, it's because there are unlikely ULONG_MAX + 1
> >>> __srcu_read_lock() called for the same idx, right? And the reason of
> >>> this is much complex: because we won't have a fair mount of threads in
> >>> the system, because no thread will nest srcu many levels, because there
> >>> won't be a lot readers using old idx.
> >>>
> >>> And this will still be true if we use new mechanism and shrink the
> >>> preemption disabled section, right?
> >>>
> >>> Regards,
> >>> Boqun
> >>>
> >>>> if we remove the preempt_{dis,en}able(). we must change the
> >>>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume
> >>>> one on-going reader needs at least need 4bytes at the stack). it is still safe.
> >>>>
> >>>> but we still need to think more if we want to remove the preempt_{dis,en}able().
> >>>>
> >>>> Thanks
> >> >> Lai
> >
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>