Re: [patch v8 3/9] sched: set initial value of runnable avg for newforked task

From: Lei Wen
Date: Mon Jun 17 2013 - 08:27:02 EST

Hi Peter,

On Mon, Jun 17, 2013 at 5:20 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Jun 14, 2013 at 06:02:45PM +0800, Lei Wen wrote:
>> Hi Alex,
>> On Fri, Jun 7, 2013 at 3:20 PM, Alex Shi <alex.shi@xxxxxxxxx> wrote:
>> > We need initialize the se.avg.{decay_count, load_avg_contrib} for a
>> > new forked task.
>> > Otherwise random values of above variables cause mess when do new task
>> > enqueue:
>> > enqueue_task_fair
>> > enqueue_entity
>> > enqueue_entity_load_avg
>> >
>> > and make forking balancing imbalance since incorrect load_avg_contrib.
>> >
>> > Further more, Morten Rasmussen notice some tasks were not launched at
>> > once after created. So Paul and Peter suggest giving a start value for
>> > new task runnable avg time same as sched_slice().
>> I am confused at this comment, how set slice to runnable avg would change
>> the behavior of "some tasks were not launched at once after created"?
>> IMHO, I could only tell that for the new forked task, it could be run if current
>> task already be set as need_resched, and preempt_schedule or
>> preempt_schedule_irq
>> is called.
>> Since the set slice to avg behavior would not affect this task's vruntime,
>> and hence cannot make current running task be need_sched, if
>> previously it cannot.
> So the 'problem' is that our running avg is a 'floating' average; ie. it
> decays with time. Now we have to guess about the future of our newly
> spawned task -- something that is nigh impossible seeing these CPU
> vendors keep refusing to implement the crystal ball instruction.

I am curious at this "crystal ball instruction" saying. :)
Could it be real? I mean what kind of hw mechanism could achieve such
magic power? What I see, for silicon vendor they could provide more
monitor unit, but to precise predict the sw's behavior, I don't think hw
also this kind of power...

> So there's two asymptotic cases we want to deal well with; 1) the case
> where the newly spawned program will be 'nearly' idle for its lifetime;
> and 2) the case where its cpu-bound.
> Since we have to guess, we'll go for worst case and assume its
> cpu-bound; now we don't want to make the avg so heavy adjusting to the
> near-idle case takes forever. We want to be able to quickly adjust and
> lower our running avg.
> Now we also don't want to make our avg too light, such that it gets
> decremented just for the new task not having had a chance to run yet --
> even if when it would run, it would be more cpu-bound than not.
> So what we do is we make the initial avg of the same duration as that we
> guess it takes to run each task on the system at least once -- aka
> sched_slice().
> Of course we can defeat this with wakeup/fork bombs, but in the 'normal'
> case it should be good enough.
> Does that make sense?

Thanks for your detailed explanation. Very useful indeed! :)

BTW, I have no question for the patch itself, but just confuse at the
patch's comment
"some tasks were not launched at once after created".

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at