Re: [PATCH RFC tip/core/rcu] Parallelize and economize NOCB kthread wakeups
From: Paul E. McKenney
Date: Tue Jul 08 2014 - 09:44:59 EST
On Thu, Jul 03, 2014 at 03:12:17PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 02, 2014 at 10:55:01AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 02, 2014 at 07:26:00PM +0200, Peter Zijlstra wrote:
> > > On Wed, Jul 02, 2014 at 10:08:38AM -0700, Paul E. McKenney wrote:
> > > > As were others, not that long ago. Today is the first hint that I got
> > > > that you feel otherwise. But it does look like the softirq approach to
> > > > callback processing needs to stick around for awhile longer. Nice to
> > > > hear that softirq is now "sane and normal" again, I guess. ;-)
> > >
> > > Nah, softirqs are still totally annoying :-)
> > Name me one thing that isn't annoying. ;-)
> > > So I've lost detail again, but it seems to me that on all CPUs that are
> > > actually getting ticks, waking tasks to process the RCU state is
> > > entirely over doing it. Might as well keep processing their RCU state
> > > from the tick as was previously done.
> > And that is in fact the approach taken by my patch. For which I just
> > kicked off testing, so expect an update later today. (And that -is-
> > optimistic! A pessimistic viewpoint would hold that the patch would
> > turn out to be so broken that it would take -weeks- to get a fix!)
> Right, but as you told Mike its not really dynamic, but of course we can
> work on that.
If it is actually needed by someone, then I would be happy to work on it.
But all I see now is people asserting that it should be provided, without
any real justification.
> That said; I'm somewhat confused on the whole nocb thing. So the way I
> see things there's two things that need doing:
> 1) push the state machine
> 2) run callbacks
> It seems to me the nocb threads do both, and somehow some of this is
> getting conflated. Because afaik RCU only uses softirqs for (2), since
> (1) is fully done from the tick -- well, it used to be, before all this.
Well, you do need a finer-grained view of the RCU state machine:
1a. Registering the need for a future grace period.
1b. Self-reporting of quiescent states (softirq).
1c. Reporting of other CPUs' quiescent states (grace-period kthread).
This includes idle CPUs, userspace nohz_full CPUs, and CPUs that
just now transitioned to offline.
1d. Kicking CPUs that have not yet reported a quiescent state
(also grace-period kthread).
2. Running callbacks (softirq, or, for RCU_NOCB_CPU, rcuo kthread).
And here (1a) is done via softirq in the non-nocb case and via the rcuo
kthreads on the nocb case.
And yes, RCU's softirq processing is normally done from the tick.
> Now, IIRC rcu callbacks are not guaranteed to run on whatever cpu
> they're queued on, so we can 'easily' splice the actual callback list
> into some other CPUs callback list. Which leaves only (1) to actually
True, although the 'easily' part needs to take into account the fact
that the RCU callbacks from an given CPU must be invoked in order.
Or rcu_barrier() needs to find a different way to guarantee that all
previously registered callbacks have been invoked, as the case may be.
> Yet the whole thing is called after the 'no-callback' thing, even though
> the most important part is pushing the state machine remotely.
Well, you do have to do both. Pushing the state machine doesn't help
unless you also invoke the RCU callbacks.
> Now I can see we'd probably don't want to actually push remote cpu's
> their rcu state from IRQ context, but we could, I think, drive the state
> machine remotely. And we want to avoid overloading one CPU with the work
> of all others, which is I think still a fundamental issue with the whole
> nohz_full thing, it reverts to the _one_ timekeeper cpu, but on big
> enough systems that'll be a problem.
Well, RCU already pushes the remote CPU's RCU state remotely via
RCU's dynticks setup. But you are quite right, dumping all of the RCU
processing onto one CPU can be a bottleneck on large systems (which
Fengguang's tests noted, by the way), and this is the reason for patch
11/17 in the fixes series (https://lkml.org/lkml/2014/7/7/990). This
patch allows housekeeping kthreads like the grace-period kthreads to
use a new housekeeping_affine() function to bind themselves onto the
non-nohz_full CPUs. The system can be booted with the desired number
of housekeeping CPUs using the nohz_full= boot parameter.
However, it is not clear to me that having only one timekeeping CPU
(as opposed to having only one housekeeping CPU) is a real problem,
even for very large systems. If it does turn out to be a real problem,
the sysidle code will probably need to change as well.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/