Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling to idleenter/exit APIs

From: Frederic Weisbecker
Date: Tue Aug 30 2011 - 14:46:11 EST


On Tue, Aug 30, 2011 at 05:22:33PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-08-30 at 16:26 +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 30, 2011 at 01:19:18PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-08-30 at 01:35 +0200, Frederic Weisbecker wrote:
> > > >
> > > > OTOH it is needed to find non-critical sections when asked to cooperate
> > > > in a grace period completion. But if no callback have been enqueued on
> > > > the whole system we are fine.
> > >
> > > Its that 'whole system' clause that I have a problem with. It would be
> > > perfectly fine to have a number of cpus very busy generating rcu
> > > callbacks, however this should not mean our adaptive nohz cpu should be
> > > bothered to complete grace periods.
> > >
> > > Requiring it to participate in the grace period state machine is a fail,
> > > plain and simple.
> >
> > We need those nohz CPUs to participate because they may use read side
> > critical section themselves. So we need them to delay running grace period
> > until the end of their running rcu read side critical sections, like any
> > other CPUs. Otherwise their supposed rcu read side critical section wouldn't
> > be effective.
> >
> > Either that or we need to only stop the tick when we are in userspace.
> > I'm not sure it would be a good idea.
>
> Well the simple fact is that rcu, when considered system-wide, is pretty
> much always busy, voiding any and all benefit you might want to gain.

With my testcase, a stupid userspace loop on a single CPU among 4, I actually
see only few RCU activity. Especially as any other CPU is pretty much idle.
There are some cases where it's not so pointless.

> > We discussed this problem, I believe the problem mostly resides in rcu sched.
> > Because finding quiescent states for rcu bh is easy, but rcu sched needs
> > the tick or context switches. (For rcu preempt I have no idea.)
> > So for now that's the sanest way we found amongst:
> >
> > - Having explicit hooks in preempt_disable() and local_irq_restore()
> > to notice end of rcu sched critical section. So that we don't need the tick
> > anymore to find quiescent states. But that's going to be costly. And we may
> > miss some more implicitly non-preemptable code path.
> >
> > - Rely on context switches only. I believe in practice it should be fine.
> > But in theory this delays the grace period completion for an unbounded
> > amount of time.
>
> Right, so what we can do is keep a per-cpu context switch counter (I'm
> sure we have one someplace and we already have the
> rcu_note_context_switch() callback in case we need another) and have
> another cpu (outside of our extended nohz domain) drive our state
> machine.
>
> But I'm sure Paul can say more sensible things than me here.

Yeah I hope we can find some solution to minimize these IPIs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/