Re: [PATCH] sched/fair: Fix misuse of available_idle_cpu()

From: Abel Wu
Date: Tue Sep 20 2022 - 03:44:36 EST


Ping :)

On 9/8/22 9:49 PM, Abel Wu wrote:
On 9/8/22 9:17 PM, Peter Zijlstra wrote:
On Thu, Sep 08, 2022 at 11:36:32AM +0100, Mel Gorman wrote:
On Thu, Sep 08, 2022 at 04:07:02PM +0800, Abel Wu wrote:
The function available_idle_cpu() was introduced to distinguish
between the code paths that cares if the vCPU is preempted and
the ones don't care. In general, available_idle_cpu() is used in
selecting cpus for immediate use, e.g. ttwu. While idle_cpu() is
used in the paths that only cares about the cpu is idle or not,
and __update_idle_core() is one of them.

Use idle_cpu() instead in the idle path to make has_idle_core
a better hint.

Fixes: 943d355d7fee (sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu())
Signed-off-by: Abel Wu <wuyun.abel@xxxxxxxxxxxxx>

Seems fair. As vCPU preemption is specific to virtualisation, it is very
unlikely that SMT is exposed to the guest so the impact of the patch is

Right; only pinned guests typically expose such topology information
(anything else would be quite broken).

Yes, and it is common in our ECS servers to use pinned guests.


minimal but I still think it's right so;

I'm not convinced; all of select_idle_sibling() seems to use
available_idle_cpu(), and that's the only consumer of
__update_idle_core(), so in that respect the current state makes sense.

Hi Peter, Mel, thanks for your reviewing!

My thought was that the preempted core can become active again before
select_idle_sibling() is called, so using available_idle_cpu() in
__update_idle_core() can potentially lose the opportunity to kick an
idle core running. While the downside of using idle_cpu() is that a
full scan can be triggered irrespective of non-preempted cores exist,
but even available_idle_cpu() can not make sure of that either.

BTW, I am also confused with select_idle_core() in which all the cpus
of a core need to be non-preempted before the core can be taken as an
idle core. IMHO, it might be enough that at least one cpu of an idle
core is non-preempted and allowed by task's taskset.