Re: weakness of runnable load tracking?

From: Mike Galbraith
Date: Thu Dec 06 2012 - 00:58:05 EST


On Thu, 2012-12-06 at 11:13 +0800, Alex Shi wrote:
> On 12/05/2012 11:19 PM, Alex Shi wrote:
> > Hi Paul&Ingo:
> >
> > Runnable load tracking patch set introduce a good way to tracking each
> > entity/rq's running time.
> > But when I try to enable it in load balance, I found burst forking
> > many new tasks will make just few cpu heavy while other cpu has no
> > much task assigned. That is due to the new forked task's
> > load_avg_contrib is zero after just created. then no matter how many
> > tasks assigned to a CPU can not increase the cfs_rq->runnable_load_avg
> > or rq->avg.load_avg_contrib if this cpu idle.
> > Actually, if just for new task issue, we can set new task's initial
> > load_avg same as load_weight. but if we want to burst wake up many
> > long time sleeping tasks, it has the same issue here since their were
> > decayed to zero. So what solution I can thought is recording the se's
> > load_avg_contrib just before dequeue, and don't decay the value, when
> > it was waken up, add this value to new cfs_rq. but if so, the runnable
> > load tracking is total meaningless.
> > So do you have some idea of burst wakeup balancing with runnable load tracking?
>
> Hi Paul & Ingo:
>
> In a short word of this issue: burst forking/waking tasks have no time
> accumulate the load contribute, their runnable load are taken as zero.
> that make select_task_rq do a wrong decision on which group is idlest.

As you pointed out above, new tasks can (and imho should) be born with
full weight. Tasks _may_ become thin, but they're all born hungry.

> There is still 3 kinds of solution is helpful for this issue.
>
> a, set a unzero minimum value for the long time sleeping task. but it
> seems unfair for other tasks these just sleep a short while.
>
> b, just use runnable load contrib in load balance. Still using
> nr_running to judge idlest group in select_task_rq_fair. but that may
> cause a bit more migrations in future load balance.
>
> c, consider both runnable load and nr_running in the group: like in the
> searching domain, the nr_running number increased a certain number, like
> double of the domain span, in a certain time. we will think it's a burst
> forking/waking happened, then just count the nr_running as the idlest
> group criteria.
>
> IMHO, I like the 3rd one a bit more. as to the certain time to judge if
> a burst happened, since we will calculate the runnable avg at very tick,
> so if increased nr_running is beyond sd->span_weight in 2 ticks, means
> burst happening. What's your opinion of this?
>
> Any comments are appreciated!

IMHO, for fork and bursty wake balancing, the only thing meaningful is
the here and now state of runqueues tasks are being dumped into.

Just because tasks are historically short running, you don't necessarily
want to take a gaggle and wedge them into a too small group just to even
out load averages. If there was a hole available that you passed up by
using average load, you lose utilization. I can see how this load
tracking stuff can average out to a win on a ~heavily loaded box, but
bursty stuff I don't see how it can do anything but harm, so imho, the
user should choose which is best for his box, instantaneous or history.

WRT burst detection: any window you define can be longer than the burst.

$.02

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/