Re: [PATCH 1/3] sched/isolation: Add HK_FLAG_SCHED to nohz_full

From: Waiman Long
Date: Tue Sep 03 2024 - 21:24:12 EST


On 9/3/24 17:32, Frederic Weisbecker wrote:
Le Tue, Sep 03, 2024 at 09:24:08AM -0400, Waiman Long a écrit :
On 9/3/24 09:10, Frederic Weisbecker wrote:
Le Sun, Aug 18, 2024 at 07:45:18PM -0400, Waiman Long a écrit :
The HK_FLAG_SCHED/HK_TYPE_SCHED flag is defined and is also used
in kernel/sched/fair.c since commit de201559df87 ("sched/isolation:
Introduce housekeeping flags"). However, the corresponding cpumask isn't
currently updated anywhere. So the mask is always cpu_possible_mask.

Add it in nohz_full setup so that nohz_full CPUs will now be removed
from HK_TYPE_SCHED cpumask.

Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/sched/isolation.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5891e715f00d..a514994af319 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -196,7 +196,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
unsigned long flags;
flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
- HK_FLAG_MISC | HK_FLAG_KTHREAD;
+ HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_SCHED;
return housekeeping_setup(str, flags);
}
find_new_ilb() already has HK_FLAG_MISC to prevent an isolated CPU
from being elected as an ilb. So I think we should simply remove HK_FLAG_SCHED.
There is a check for HK_TYPE_SCHED in nohz_balance_enter_idle() and
nohz_newidle_balance(), though it is essentially a no-op as the cpumask has
all the CPUs. If we remove HK_TYPE_SCHED, the question now will be whether
we should remove the checks at these 2 functions or change them to
HK_TYPE_MISC.
Just remove those two. They are dead code and the nohz_full handling
of load balancing needs a rethink anyway.
OK, I will modified the patch to remove the dead code.

After discussing with Peter lately, the rules should be:

1) If a nohz_full CPU is part of a multi-CPU domain, then it should
be part of load balancing. Peter even says that nohz_full should be
forbidden in this case, because the tick plays a role in the
load balancing.

My understand is that most users will use nohz_full together with isolcpus. So nohz_full CPUs are also isolated and not in a sched domain. There may still be user setting nohz_full without isolcpus though, but that should be relatively rare.

Anyway, all these nohz_full/kernel_nose setting will only apply to CPUs in isolated cpuset partitions which will not be in a sched domain.


2) Otherwise, if CPU is not part of a domain or it is the only CPU of all its
domains, then it can be out of the load balancing machinery.
I am aware that a single-cpu domain is the same as being isolated with no load balancing.

I'm a bit scared about rule 1) because I know there are existing users of
nohz_full on multi-CPU domains... So I feel a bit trapped.

As stated before, this is not a common use case.

The isolcpus boot option is deprecated, as stated in kernel-parameters.txt. My plan is to deprecate nohz_full as well once we are able to make dynamic CPU isolation via cpuset works almost as good as isolcpus + nohz_full.

Cheers,
Longman