Re: [PATCH 05/32] nohz: Move rcu dynticks idle mode handling toidle enter/exit APIs

From: Peter Zijlstra
Date: Wed Aug 31 2011 - 05:18:17 EST


On Wed, 2011-08-31 at 00:24 +0200, Frederic Weisbecker wrote:
> On Tue, Aug 30, 2011 at 10:58:38PM +0200, Peter Zijlstra wrote:
> > On Tue, 2011-08-30 at 17:42 +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-08-30 at 17:33 +0200, Frederic Weisbecker wrote:
> > > > > See all that is still kernelspace ;-) I think I know what you mean to
> > > > > say though, but seeing as you note there is even now a known shortcoming
> > > > > I'm not very confident its a solid construction. What will help us find
> > > > > such holes?
> > > >
> > > > This: https://lkml.org/lkml/2011/6/23/744
> > > >
> > > > It's in one of Paul's branches and should make it for the next merge window.
> > > > This should detect any of such holes. I made that on purpose for the nohz cpusets
> > > > when I saw how much error prone that can be with rcu :)
> > >
> > > OK, good ;-)
> > >
> > > > > I would much rather we not rely on such fragile things too much.. this
> > > > > RCU stuff wants way more thought, as it stands your patch-set doesn't do
> > > > > anything useful IMO.
> > > >
> > > > Not sure what you mean. Well that Rcu thing for sure is fragile but we have
> > > > the tools ready to find the problems.
> > >
> > > Right that thing you linked above does catch abuse, still your current
> > > proposal means that due to RCU it will basically never disable the tick.
> >
> > So how about something like:
> >
> > Assuming we are in rcu_nohz state; on kernel enter we leave rcu_nohz but
> > don't start the tick, instead we assign another cpu to run our state
> > machine.
>
> The nohz CPU still has to notice its own quiescent states.

Why? rcu-sched can use a context-switch counter, rcu-preempt doesn't
even need that. Remote cpus can notice those just fine.

> Now it could be
> an optimization to ask another CPU to handle all the rest once that quiescent
> state is found. That doesn't solve our main problem though which is to
> reliably report quiescent states when asked for.

No, seriously, RCU should not, ever, need to re-enable the tick. Imagine
a HPC workload where the system cores are also responsible for all IO
and all the adaptive-nohz cores are simply crunching numbers. In that
scenario you'll have a very high rcu usage because the system cores are
all very busy arranging work for the computation cores.

> > On kernel exit we 'donate' all our rcu state to a willing victim (the
> > same that earlier was kind enough to drive our state) and undo our
> > entire GP accounting and re-enter rcu_nohz state.
>
> That's already what does rcu_enter_nohz().

Almost but not quite, it doesn't donate the callbacks for example
(something it does do on hotplug -- and therefore any assumption the
callback will in fact run on the cpu you submit it on is already
broken).

> > If between that time we did restart the tick, we take back our rcu state
> > and skip the donate and rcu_nohz enter on kernel exit.
>
> That's also what is done in this patchset.

Its not, since you don't hand of the grace period detectoring you don't
take it back now do you..

> As soon as we re-enter the kernel
> or the tick had to be restarted before we re-enter the kernel,

Another impossibility, you can only restart the tick from the kernel.

> we call
> rcu_exit_nohz() that pulls back the CPU to the whole RCU machinery.

But you then also start the tick again..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/