Re: [RFC PATCH 2/5] sched: Add NOHZ_STATS_KICK
From: Vincent Guittot
Date: Fri Dec 22 2017 - 09:33:22 EST
On 22 December 2017 at 10:12, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Dec 22, 2017 at 09:29:15AM +0100, Peter Zijlstra wrote:
>> On Fri, Dec 22, 2017 at 09:05:45AM +0100, Vincent Guittot wrote:
>> > On 22 December 2017 at 08:59, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > > On Thu, Dec 21, 2017 at 05:56:32PM +0100, Vincent Guittot wrote:
>> > >> In fact, we can't only rely on the tick and newly_idle load balance to
>> > >> ensure a period update of the blocked load because they can never
>> > >> happen.
>> > >
>> > > I'm confused, why would the ilb not happen?
>> >
>> > the ilb will be kick only if tick fires which might not be the case
>> > for task that runs less than a tick
>>
>> Oh, urgh, you're talking about when the entire system is idle. Yes
>> indeed.
>>
>> Lemme have a think, surely we can do something saner there.
>
> The only thing I could come up with is running a timer for this :/ That
> would keep the ILB thing running until all load is decayed (have a patch
> for that somewhere).
IMHO running a timer doesn't sound really great
When we have enough activity on the system, the tick and the periodic
load balance will ensure the update of load of all cpus (including the
idle cpus) at the load balance period pace But if we don't have enough
activity to trig the periodic update through ilb or because the system
is not overloaded or even almost idle, we don't have these periodic
update anymore. The goal is to do a lazy update of the blocked load to
not hurt too much power consumption of idle CPUs. When a task wakes up
and the blocked idle load have not been updated for a while, we trig
the update of these blocked loads in parallel to the wake up so the
data will be more accurate for the next events. It's already too late
for the current wake up but that's not a big deal because the wake up
path of a light loaded system is mainly choosing between previous and
current cpu and the load_avg_contrib and the utilization will have
been updated for next events.
>