Re: [PATCH] LoongArch: Fix cpu hotplug issue

From: maobibo
Date: Mon Oct 14 2024 - 06:01:46 EST




On 2024/10/14 下午5:29, Huacai Chen wrote:
On Mon, Oct 14, 2024 at 5:12 PM maobibo <maobibo@xxxxxxxxxxx> wrote:



On 2024/10/14 下午4:23, Huacai Chen wrote:
On Mon, Oct 14, 2024 at 4:01 PM maobibo <maobibo@xxxxxxxxxxx> wrote:

Huacai,

On 2024/10/14 下午3:39, Huacai Chen wrote:
On Mon, Oct 14, 2024 at 3:21 PM maobibo <maobibo@xxxxxxxxxxx> wrote:

Huacai,

On 2024/10/14 下午3:05, Huacai Chen wrote:
Hi, Bibo,

I'm a little confused, so please correct me if I'm wrong.

On Mon, Oct 14, 2024 at 2:33 PM Bibo Mao <maobibo@xxxxxxxxxxx> wrote:

On LoongArch system, there are two places to set cpu numa node. One
is in arch specified function smp_prepare_boot_cpu(), the other is
in generic function early_numa_node_init(). The latter will overwrite
the numa node information.

However for hot-added cpu, cpu_logical_map() fails to its physical
cpuid at beginning since it is not enabled in ACPI MADT table. So
function early_cpu_to_node() also fails to get its numa node for
hot-added cpu, and generic function early_numa_node_init() will
overwrite incorrect numa node.
For hot-added cpus, we will call acpi_map_cpu() -->
acpi_map_cpu2node() --> set_cpuid_to_node(), and set_cpuid_to_node()
operates on __cpuid_to_node[]. So I think early_cpu_to_node() should
be correct?

__cpuid_to_node[] is correct which is physical cpuid to numa node,
however cpu_logical_map(cpu) is not set. It fails to get physical cpuid
from logic cpu.

int early_cpu_to_node(int cpu)
{
int physid = cpu_logical_map(cpu);

<<<<<<<<<<< Here physid is -1.
early_cpu_to_node() is not supposed to be called after boot, and if it
Which calls early_cpu_to_node() after boot?

is really needed, I think a better solution is:

diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index f1a74b80f22c..998cf45fd3b7 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -311,6 +311,8 @@ static int __ref acpi_map_cpu2node(acpi_handle
handle, int cpu, int physid)

nid = acpi_get_node(handle);
if (nid != NUMA_NO_NODE) {
+ __cpu_number_map[physid] = cpu;
+ __cpu_logical_map[cpu] = physid;
This does not solve the problem. The above has been done in function
cpu = set_processor_mask(physid, ACPI_MADT_ENABLED);

static int set_processor_mask(u32 id, u32 flags)
{
...
if (flags & ACPI_MADT_ENABLED) {
num_processors++;
set_cpu_present(cpu, true);
__cpu_number_map[cpuid] = cpu;
__cpu_logical_map[cpu] = cpuid;
}

The problem is that
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
<<<<<<<<<<<<<<<<
set_cpu_numa_node() is called in function smp_prepare_boot_cpu()

early_numa_node_init();

static void __init early_numa_node_init(void)
{
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
#ifndef cpu_to_node
int cpu;

/* The early_cpu_to_node() should be ready here. */
for_each_possible_cpu(cpu)
set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
<<<<<<<<<<<<<<<<
* however here early_cpu_to_node is -1, so that cpu_to_node(cpu) will
always return -1 in late. *, which causes cpu hotadd problem.
Still confused. For ACPI_MADT_ENABLED cpus, everything is right after
early_numa_node_init(). For !ACPI_MADT_ENABLED cpus, cpu_to_node()
returns -1 after early_numa_node_init() and before hot-add, but if
acpi_map_cpu() do things right, cpu_to_node() should still work well
after hot-add.
yes, if "_PXM" information for hot-add cpu handle exist, it works well.

However if "_PXM" information does not exist, it falls back to legacy
method from smp_prepare_boot_cpu(). However cpu_numa_node information is
overwritten with -1 by later function early_numa_node_init().
OK, now I finally get the key point. But no _PXM should be treated as
a BIOS bug, right?
Currently if no numa information is added in qemu command line, there will be no "_PXM" information for hot-added cpu. Such as for this command:
qemu-system-loongarch64 -m 4096 -smp 1,maxcpus=4,sockets=1,cores=4,threads=1

From comments we can see:

* If possible cpus > present cpus here (e.g. some possible
* cpus will be added by cpu-hotplug later), for possible but
* not present cpus, early_cpu_to_node will return NUMA_NO_NODE,
* and we just map them to online nodes in round-robin way.
* Once hotplugged, new correct mapping will be built for them.

This means even with this patch, cpu_to_node() can return a "valid"
node rather than NUMA_NO_NODE, but this round-robin node is still an
incorrect node.
The round-robin node is not standard, may it is copied from x86, I do not know how to use it however. At least SRAT tables provides numa information only that there is not logical cpu allocated in SRAT table parsing. How about something like this?

diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index f1a74b80f22c..bb9fdd318998 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -310,6 +310,12 @@ static int __ref acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
int nid;

nid = acpi_get_node(handle);
+ /*
+ * Fall back to srat numa node information if _PXM is not provided
+ */
+ if (nid != NUMA_NO_NODE)
+ nid = __cpuid_to_node[physid];
+
if (nid != NUMA_NO_NODE) {
set_cpuid_to_node(physid, nid);
node_set(nid, numa_nodes_parsed);

Regards
Bibo Mao

Huacai


Regards
Bibo Mao

Huacai

Regards
Bibo Mao


set_cpuid_to_node(physid, nid);
node_set(nid, numa_nodes_parsed);
set_cpu_numa_node(cpu, nid);

Huacai


if (physid < 0)
return NUMA_NO_NODE;

return __cpuid_to_node[physid];
}

Regards
Bibo Mao

Huacai


Here static array __cpu_to_node and api set_early_cpu_to_node()
is added, so that early_cpu_to_node is consistent with function
cpu_to_node() for hot-added cpu.

Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
---
arch/loongarch/include/asm/numa.h | 2 ++
arch/loongarch/kernel/numa.c | 10 +++++++++-
arch/loongarch/kernel/smp.c | 1 +
3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/numa.h b/arch/loongarch/include/asm/numa.h
index b5f9de9f102e..e8e6fcfb006a 100644
--- a/arch/loongarch/include/asm/numa.h
+++ b/arch/loongarch/include/asm/numa.h
@@ -50,6 +50,7 @@ static inline void set_cpuid_to_node(int cpuid, s16 node)
}

extern int early_cpu_to_node(int cpu);
+extern void set_early_cpu_to_node(int cpu, s16 node);

#else

@@ -57,6 +58,7 @@ static inline void early_numa_add_cpu(int cpuid, s16 node) { }
static inline void numa_add_cpu(unsigned int cpu) { }
static inline void numa_remove_cpu(unsigned int cpu) { }
static inline void set_cpuid_to_node(int cpuid, s16 node) { }
+static inline void set_early_cpu_to_node(int cpu, s16 node) { }

static inline int early_cpu_to_node(int cpu)
{
diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c
index 84fe7f854820..62508aace644 100644
--- a/arch/loongarch/kernel/numa.c
+++ b/arch/loongarch/kernel/numa.c
@@ -34,6 +34,9 @@ static struct numa_meminfo numa_meminfo;
cpumask_t cpus_on_node[MAX_NUMNODES];
cpumask_t phys_cpus_on_node[MAX_NUMNODES];
EXPORT_SYMBOL(cpus_on_node);
+static s16 __cpu_to_node[NR_CPUS] = {
+ [0 ... CONFIG_NR_CPUS - 1] = NUMA_NO_NODE
+};

/*
* apicid, cpu, node mappings
@@ -117,11 +120,16 @@ int early_cpu_to_node(int cpu)
int physid = cpu_logical_map(cpu);

if (physid < 0)
- return NUMA_NO_NODE;
+ return __cpu_to_node[cpu];

return __cpuid_to_node[physid];
}

+void set_early_cpu_to_node(int cpu, s16 node)
+{
+ __cpu_to_node[cpu] = node;
+}
+
void __init early_numa_add_cpu(int cpuid, s16 node)
{
int cpu = __cpu_number_map[cpuid];
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 9afc2d8b3414..998668be858c 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -512,6 +512,7 @@ void __init smp_prepare_boot_cpu(void)
set_cpu_numa_node(cpu, node);
else {
set_cpu_numa_node(cpu, rr_node);
+ set_early_cpu_to_node(cpu, rr_node);
rr_node = next_node_in(rr_node, node_online_map);
}
}

base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
--
2.39.3