Re: [RFC PATCH 1/2] NUMA balancing: fix NUMA topology type for memory tiering system

From: Huang, Ying
Date: Fri Jan 28 2022 - 02:30:55 EST


Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> writes:

> * Huang Ying <ying.huang@xxxxxxxxx> [2022-01-28 10:38:41]:
>
>>
>> One possible fix is to ignore CPU-less nodes when detecting NUMA
>> topology type in init_numa_topology_type(). That works well for the
>> example system. Is it good in general for any system with CPU-less
>> nodes?
>>
>
> A CPUless node at the time online doesn't necessarily mean a CPUless node
> for the entire boot. For example: On PowerVM Lpars, aka powerpc systems,
> some of the nodes may start as CPUless nodes and then CPUS may get
> populated/hotplugged on them.

Got it!

> Hence I am not sure if adding a check for CPUless nodes at node online may
> work for such systems.

How about something as below?

Best Regards,
Huang, Ying

-----------------------8<-----------------------------

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index d201a7052a29..733e8bd930b4 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1737,7 +1737,13 @@ static void init_numa_topology_type(void)
}

for_each_online_node(a) {
+ if (!node_state(a, N_CPU))
+ continue;
+
for_each_online_node(b) {
+ if (!node_state(b, N_CPU))
+ continue;
+
/* Find two nodes furthest removed from each other. */
if (node_distance(a, b) < n)
continue;
@@ -1849,6 +1855,13 @@ void sched_init_numa(void)

sched_domains_numa_masks[i][j] = mask;

+ /*
+ * The mask will be initialized when the first CPU of
+ * the node is onlined.
+ */
+ if (!node_state(j, N_CPU))
+ continue;
+
for_each_node(k) {
/*
* Distance information can be unreliable for
@@ -1919,8 +1932,10 @@ void sched_init_numa(void)
return;

bitmap_zero(sched_numa_onlined_nodes, nr_node_ids);
- for_each_online_node(i)
- bitmap_set(sched_numa_onlined_nodes, i, 1);
+ for_each_online_node(i) {
+ if (node_state(i, N_CPU))
+ bitmap_set(sched_numa_onlined_nodes, i, 1);
+ }
}

static void __sched_domains_numa_masks_set(unsigned int node)
@@ -1928,7 +1943,7 @@ static void __sched_domains_numa_masks_set(unsigned int node)
int i, j;

/*
- * NUMA masks are not built for offline nodes in sched_init_numa().
+ * NUMA masks are not built for offline/CPU-less nodes in sched_init_numa().
* Thus, when a CPU of a never-onlined-before node gets plugged in,
* adding that new CPU to the right NUMA masks is not sufficient: the
* masks of that CPU's node must also be updated.