Re: Very high CPU load when idle with 3.0-rc1

From: Peter Zijlstra
Date: Wed Jun 01 2011 - 12:55:15 EST


On Wed, 2011-06-01 at 07:37 -0700, Paul E. McKenney wrote:

> > > I considered that, but working out when it is OK to deboost them is
> > > decidedly non-trivial.
> >
> > Where exactly is the problem there? The boost lasts for as long as it
> > takes to finish the grace period, right? There's a distinct set of
> > callbacks associated with each grace-period, right? In which case you
> > can de-boost your thread the moment you're done processing that set.
> >
> > Or am I simply confused about how all this is supposed to work?
>
> The main complications are: (1) the fact that it is hard to tell exactly
> which grace period to wait for, this one or the next one, and (2) the
> fact that callbacks get shuffled when CPUs go offline.

I can't say I would worry too much about 2, hotplug and RT don't really
go hand-in-hand anyway.

On 1 however, is that due to the boost condition?

I must admit that my thought there is somewhat fuzzy since I just
realized I don't actually know the exact condition to start boosting,
but suppose we boost because the queue is too large, then waiting for
the current grace period might not reduce the queue length, as most
callbacks might actually be for the next.

If however the condition is grace period duration, then completion of
the current grace period is sufficient, since the whole boost condition
is defined as such. [ if the next is also exceeding the time limit,
that's a whole next boost ]

> That said, it might be possible if we are willing to live with some
> approximate behavior. For example, always waiting for the next grace
> period (rather than the current one) to finish, and boosting through the
> extra callbacks in case where a given CPU "adopts" callbacks that must
> be boosted when that CPU also has some callbacks whose priority must be
> boosted and some that need not be.

That might make sense, but I must admit to not fully understanding the
whole current/next thing yet.

> The reason I am not all that excited about taking this approach is that
> it doesn't help worst-case latency.

Well, not running at the top most prio does help those tasks running at
a higher priority, so in that regard it does reduce the jitter for a
number of tasks.

Also, I guess there's the whole question of what prio to boost to which
I somehow totally forgot about, which is a non-trivial thing in its own
right, since there isn't really someone blocked on grace period
completion (although in the special case of someone calling sync_rcu it
is clear).

> Plus the current implementation is just a less-precise approximation.
> (Sorry, couldn't resist!)

Appreciated, on a similar note I still need to actually look at all this
(preempt) tree-rcu stuff to learn how exactly it works.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/