Re: [PATCH 02/10] sched: fix find_idlest_group mess logical

From: Alex Shi
Date: Sat Dec 08 2012 - 07:12:21 EST


On 12/07/2012 04:33 PM, Frederic Weisbecker wrote:
> 2012/12/7 Alex Shi <alex.shi@xxxxxxxxx>:
>> On 12/07/2012 08:56 AM, Frederic Weisbecker wrote:
>>> 2012/12/3 Alex Shi <alex.shi@xxxxxxxxx>:
>>>> There is 4 situations in the function:
>>>> 1, no task allowed group;
>>>> so min_load = ULONG_MAX, this_load = 0, idlest = NULL
>>>> 2, only local group task allowed;
>>>> so min_load = ULONG_MAX, this_load assigned, idlest = NULL
>>>> 3, only non-local task group allowed;
>>>> so min_load assigned, this_load = 0, idlest != NULL
>>>> 4, local group + another group are task allowed.
>>>> so min_load assigned, this_load assigned, idlest != NULL
>>>>
>>>> Current logical will return NULL in first 3 kinds of scenarios.
>>>> And still return NULL, if idlest group is heavier then the
>>>> local group in the 4th situation.
>>>>
>>>> Actually, I thought groups in situation 2,3 are also eligible to host
>>>> the task. And in 4th situation, agree to bias toward local group.
>>>> So, has this patch.
>>>
>>> The way I understand the loop that use this in select_task_rq_fair() is:
>>>
>>> a) start from the highest domain level we are allowed to run to
>>> migrate the task in
>>> b) from that top level domain, find the idlest group. If the idlest
>>> group contains current CPU, zoom in the child domain and repeat b). If
>>> the idlest group doesn't contain the current CPU, pick the idlest CPU
>>> from that group.
>>> c) In the end if we found no idler target than current CPU, then take it.
>>>
>>> So if you also return a group that contains current CPU from
>>> find_idlest_group(), you don't recursively zoom in the child domain
>>> anymore. find_idlest_cpu() will fix that for you but it may come with
>>> some cost because now it iterates through every CPUs, or may be half
>>> of them.
>>
>> Not exactly, the old logical won't select cpu from group of situation 2
>> and 3. That is wrong. and may cause the task keep stay on prev_cpu even
>> there are still other idler and allowed cpu exist.
>
> For situation 2 I don't understand the issue. If current CPU belong to
> idlest group we want to zoom in our lookup until we find something an
> idler group than the current CPU's? If we eventually don't find it,
> then we fallback to current CPU, don't we?

fallback to current CPU is not the best choice here. :)
The meaning to release situation 2 is that this_cpu may not the idlest
cpu even in local group. there maybe other idlers in the local group.
But the old logical will refuse to select idlest cpu from the local
group, just there is no other group eligible. that is the problem.

>
> I just have a doubt to express. How does the final leaf child domain
> look like? Is it made of current CPU only or can it contain other
> siblings? In the first case we are fine. In the second one, if this
> domain is made of only one group of several CPUs, we are skipping the
> find_idlest_cpu() call for that group and choose this_cpu by default.
> Which is probably suboptimized?

In most of time, the domain is not leaf child domain. and even with leaf
child and single group domain, find_idlest_cpu will return quickly, that
should not cause much trouble.
>
> Concerning situation 3, if this_cpu is not a CPU allowed by the task,
> we may indeed have an issue because find_idlest_group() doesn't seem
> to be selecting non-local groups in this case.

thanks!

But your current fix
> still breaks the recursive find_idlest_group() on other cases and may
> not scale with big number of CPUs.
>

I don't understand recursive fig can scale with big number CPU. :)
Actually, this patch set just show 10+% performance gain on hackbench
with big machines, while just 0~2% performance gain on 2 sockets machine.


--
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/