Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent statesafter extended grace periods

From: Paul E. McKenney
Date: Mon Nov 29 2010 - 18:07:07 EST


On Fri, Nov 26, 2010 at 03:06:43PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 25, 2010 at 06:56:32AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 24, 2010 at 11:42:57PM +0100, Frederic Weisbecker wrote:
> > > Le Wed, 24 Nov 2010 10:20:51 -0800,
> > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> a écrit :
> > >
> > > > On Wed, Nov 24, 2010 at 06:38:45PM +0100, Frederic Weisbecker wrote:
> > > > > Yeah. I mean, I need to read how the code manages the different
> > > > > queues. But __rcu_process_gp_end() seems to sum it up quite well.
> > > >
> > > > For advancing callbacks, that is the one! For invocation of
> > > > callbacks, see rcu_do_batch().
> > >
> > > Ok.
> > >
> > > > > It's more like couldn't ever stop the tick. But that doesn't
> > > > > concern mainline. This is because I have a hook that prevents the
> > > > > tick from beeing stopped until rcu_pending() == 0.
> > > >
> > > > That would certainly change behavior!!! Why did you need to do that?
> > > >
> > > > Ah, because force_quiescent_state() has not yet been taught about
> > > > dyntick-HPC, got it...
> > >
> > > Oh actually I have taught it about that. For such isolated CPU that
> > > doesn't respond, it sends a specific IPI that will restart the tick if
> > > we are not in nohz.
> > >
> > > The point in restarting the tick is to find some quiescent states and
> > > also to keep the tick for a little while for potential further grace
> > > periods to complete while we are in the kernel.
> > >
> > > This is why I use rcu_pending() from the tick: to check if we still
> > > need the tick for rcu.
> >
> > So if you have received the IPI, then you turn on the tick. As soon as
> > rcu_pending() returns false, you can turn off the tick. Once the tick
> > is off, you can go back to using rcu_needs_cpu(), right? (At least
> > until that CPU receives another RCU IPI.)
>
> Yep. Although there is already a test for rcu_needs_cpu() in
> tick_nohz_stop_sched_tick() that cancels the switch to nohz mode.

Sounds plausible. ;-)

> > Either that or you need to check for both rcu_needs_cpu() -and-
> > rcu_pending(). Actually, you do need to check both during the time that
> > you have the tick on after receiving an RCU IPI.
>
> Yeah.

Before I forget (again!) -- my thinking over the past few months has
been in terms of making rcu_pending() simpler by permitting more false
positives. This makes sense if the only penalty is a needless softirq.
I am guessing that this sort of change would be a bad idea from your
viewpoint, but thought I should check.

> > > But it's not enough to track quiescent states. And we have no more timer
> > > interrupts to track them. So we need to restart the tick at least until
> > > we find a quiescent state for the grace period waiting for us.
> > >
> > > But I may be missing something either :)
> >
> > Well, I don't know whether or not you are missing it, but I do need to
> > get my act together and get RCU priority boosting ported from tiny
> > RCU to tree RCU. Otherwise, force_quiescent_state() cannot do much
> > for preemptible RCU.
>
> And if I understood it correctly, RCU priority boosting involves a new
> kthread that handles the callbacks instead of a softirq? So that you
> can give dynamic priority to this thread and so?

Yes, there will be a new kthread, and yes, because this allows its
priority to be adjusted independently of non-RCU stuff. My current
design (and code, in the case of Tiny RCU) runs this at a fixed real-time
priority. This priority can be adjusted via the chrt command, if desired,
though you can of course shoot yourself in the foot quite impressively
if you do this carelessly.

> How will that help in the preemptibe RCU case to force the quiescent
> state? Scheduling the kthread doesn't mean that every tasks that were
> preempted have been rescheduled and exited their rcu_read_unlock(), so
> I guess you plan another trickery :)

There is one kthread per CPU, so that there can be lock-free interaction
similar to that between mainline and the RCU softirq. ;-)

But in the meantime, your approach is good for experimental purposes.
And might well be the way it needs to happen longer term, for all I know.

> > > > > Ah, I see what you mean. So you would suggest to even ignore those
> > > > > explicit QS report when in dynticj-hpc mode for CPUs that don't
> > > > > have callbacks?
> > > > >
> > > > > Why not keeping them?
> > > >
> > > > My belief is that people needing dyntick-HPC are OK with RCU grace
> > > > periods taking a few jiffies longer than they might otherwise.
> > > > Besides, when you are running dyntick-HPC, you aren't context
> > > > switching much, so keeping the tick doesn't buy you as much reduction
> > > > in grace-period latency.
> > >
> > > But don't we still need the tick on such cases, (if we aren't in
> > > userspace) when a grace period starts, to note our grace periods?
> > > The rcu IPI itself doesn't seem to be sufficient for that.
> > >
> > > I'm not sure I undrstand what you mean.
> >
> > If there is a grace period, there must be an RCU callback. The CPU that
> > has the RCU callback queued will keep its own tick going, because
> > rcu_needs_cpu() will return true.
> >
> > The reason for ignoring the explicit QS reports in at least some of the
> > cases is that paying attention to them requires that the tick be enabled.
> > Which is fine for most workloads, but not so good for workloads that care
> > about OS jitter.
>
> Is there another solution than restarting the tick a local CPU when we need
> a quiescent state from it? The simple IPI is not enough to ensure we find a
> decent grace period, in that case we need to restart the tick anyway if we
> are in the kernel, no?

It depends on the flavor of RCU. RCU-sched and RCU-bh are taken
care of if the IPI handler does a set_need_resched() followed by a
raise_softirq(). Of course, perhaps set_need_resched() ends up restarting
the tick, depending on the order of checks in your schedule() path.

But it doesn't need to. RCU's force_quiescent_state() would resend the
IPI periodically, which would force the CPU through its state machine.
The set_need_resched() would force a quiescent state, and the IPIs
following that would push RCU through the steps needed to note that the
corresponding CPU was no longer blocking the grace period.

Preemptible RCU works similarly, unless there is a low-priority task
that is being indefinitely preempted in an RCU read-side critical
section. In that case, the low-priority task is in need of priority
boosting.

Or am I still missing something?

> Enjoy your turkey(s) ;-)

I did, thank you! Only one turkey, though. You must be confusing me
with my 30-years-ago self. ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/