[tip: sched/core] sched/isolation: Make use of more than one housekeeping cpu

From: tip-bot2 for Phil Auld
Date: Tue Apr 08 2025 - 15:07:51 EST


The following commit has been merged into the sched/core branch of tip:

Commit-ID: 6432e163ba1b7d80b5876792ce53e511f041ab91
Gitweb: https://git.kernel.org/tip/6432e163ba1b7d80b5876792ce53e511f041ab91
Author: Phil Auld <pauld@xxxxxxxxxx>
AuthorDate: Tue, 18 Feb 2025 18:46:18
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Tue, 08 Apr 2025 20:55:55 +02:00

sched/isolation: Make use of more than one housekeeping cpu

The exising code uses housekeeping_any_cpu() to select a cpu for
a given housekeeping task. However, this often ends up calling
cpumask_any_and() which is defined as cpumask_first_and() which has
the effect of alyways using the first cpu among those available.

The same applies when multiple NUMA nodes are involved. In that
case the first cpu in the local node is chosen which does provide
a bit of spreading but with multiple HK cpus per node the same
issues arise.

We have numerous cases where a single HK cpu just cannot keep up
and the remote_tick warning fires. It also can lead to the other
things (orchastration sw, HA keepalives etc) on the HK cpus getting
starved which leads to other issues. In these cases we recommend
increasing the number of HK cpus. But... that only helps the
userspace tasks somewhat. It does not help the actual housekeeping
part.

Spread the HK work out by having housekeeping_any_cpu() and
sched_numa_find_closest() use cpumask_any_and_distribute()
instead of cpumask_any_and().

Signed-off-by: Phil Auld <pauld@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Reviewed-by: Waiman Long <longman@xxxxxxxxxx>
Reviewed-by: Vishal Chourasia <vishalc@xxxxxxxxxxxxx>
Acked-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
Link: https://lore.kernel.org/r/20250218184618.1331715-1-pauld@xxxxxxxxxx
---
kernel/sched/isolation.c | 2 +-
kernel/sched/topology.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 81bc8b3..93b038d 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -40,7 +40,7 @@ int housekeeping_any_cpu(enum hk_type type)
if (cpu < nr_cpu_ids)
return cpu;

- cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
+ cpu = cpumask_any_and_distribute(housekeeping.cpumasks[type], cpu_online_mask);
if (likely(cpu < nr_cpu_ids))
return cpu;
/*
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b334f25..bbc2fc2 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2098,7 +2098,7 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
for (i = 0; i < sched_domains_numa_levels; i++) {
if (!masks[i][j])
break;
- cpu = cpumask_any_and(cpus, masks[i][j]);
+ cpu = cpumask_any_and_distribute(cpus, masks[i][j]);
if (cpu < nr_cpu_ids) {
found = cpu;
break;