Re: [REGRESSION 2.6.30][PATCH 1/1] sched: defer idle accounting till after load update period

From: Chase Douglas
Date: Mon Mar 29 2010 - 13:20:56 EST


On Mon, Mar 29, 2010 at 10:41 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, 2010-03-29 at 09:41 -0400, Chase Douglas wrote:
>> There's a period of 10 ticks where calc_load_tasks is updated by all the
>> cpus for the load avg. Usually all the cpus do this during the first
>> tick. If any cpus go idle, calc_load_tasks is decremented accordingly.
>> However, if they wake up calc_load_tasks is not incremented. Thus, if
>> cpus go idle during the 10 tick period, calc_load_tasks may be
>> decremented to a non-representative value. This issue can lead to
>> systems having a load avg of exactly 0, even though the real load avg
>> could theoretically be up to NR_CPUS.
>>
>> This change defers calc_load_tasks accounting after each cpu updates the
>> count until after the 10 tick period.
>
> >From reading the above changelog it seems to me there should be a
> callback from leaving nohz mode, your proposed patch has no such thing.

I believe what you're implying is that there should be a corresponding
call for when a cpu is awakened to counter the accounting when a cpu
goes to sleep during the 10 tick window. I don't think this is the
correct approach because the load avg should be a snapshot in time. If
there were 10 runnable tasks at the beginning of the load calculation
window, then we should account for 10 tasks. If they all get serviced
and the cpu goes to sleep, then wakes back up with one runnable task,
we would account for only one task instead of the original 10. What if
9 more tasks then get added to the cpu's rq during the 10 tick update
period? None of them would be accounted for either.

-- Chase
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/