Re: [PATCH v2 10/11] sched: move cfs task on a CPU with higher capacity

From: Peter Zijlstra
Date: Tue Jun 03 2014 - 07:16:10 EST


On Mon, Jun 02, 2014 at 07:06:44PM +0200, Vincent Guittot wrote:
> > Could you detail those conditions? FWIW those make excellent Changelog
> > material.
>
> I have looked back into my tests and traces:
>
> In a 1st test, the capacity of the CPU was still above half default
> value (power=538) unlike what i remembered. So it's some what "normal"
> to keep the task on CPU0 which also handles IRQ because sg_capacity
> still returns 1.

OK, so I suspect that once we move to utilization based capacity stuff
we'll do the migration IF the task indeed requires more cpu than can be
provided by the reduced, one, right?

> In a 2nd test,the main task runs (most of the time) on CPU0 whereas
> the max power of the latter is only 623 and the cpu_power goes below
> 512 (power=330) during the use case. So the sg_capacity of CPU0 is
> null but the main task still stays on CPU0.
> The use case (scp transfer) is made of a long running task (ssh) and a
> periodic short task (scp). ssh runs on CPU0 and scp runs each 6ms on
> CPU1. The newly idle load balance on CPU1 doesn't pull the long
> running task although sg_capacity is null because of
> sd->nr_balance_failed is never incremented and load_balance doesn't
> trig an active load_balance. When an idle balance occurs in the middle
> of the newly idle balance, the ssh long task migrates on CPU1 but as
> soon as it sleeps and wakes up, it goes back on CPU0 because of the
> wake affine which migrates it back on CPU0 (issue solved by patch 09).

OK, so there's two problems here, right?
1) we don't migrate away from cpu0
2) if we do, we get pulled back.

And patch 9 solves 2, so maybe enhance its changelog to mention this
slightly more explicit.

Which leaves us with 1.. interesting problem. I'm just not sure
endlessly kicking a low capacity cpu is the right fix for that.


Attachment: pgpJqLShs5Y5h.pgp
Description: PGP signature