Re: [PATCH v2] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
From: Mike Rapoport
Date: Thu Jan 25 2024 - 02:32:36 EST
On Wed, Jan 24, 2024 at 09:19:00AM -0800, Lameter, Christopher wrote:
> On Tue, 23 Jan 2024, Huang Shijie wrote:
>
> > During the kernel booting, the generic cpu_to_node() is called too early in
> > arm64, powerpc and riscv when CONFIG_NUMA is enabled.
> >
> > For arm64/powerpc/riscv, there are at least four places in the common code
> > where the generic cpu_to_node() is called before it is initialized:
> > 1.) early_trace_init() in kernel/trace/trace.c
> > 2.) sched_init() in kernel/sched/core.c
> > 3.) init_sched_fair_class() in kernel/sched/fair.c
> > 4.) workqueue_init_early() in kernel/workqueue.c
> >
> > In order to fix the bug, the patch changes generic cpu_to_node to
> > function pointer, and export it for kernel modules.
> > Introduce smp_prepare_boot_cpu_start() to wrap the original
> > smp_prepare_boot_cpu(), and set cpu_to_node with early_cpu_to_node.
> > Introduce smp_prepare_cpus_done() to wrap the original smp_prepare_cpus(),
> > and set the cpu_to_node to formal _cpu_to_node().
>
> Would you please fix this cleanly without a function pointer?
>
> What I think needs to be done is a patch series.
>
> 1. Instrument cpu_to_node so that some warning is issued if it is used too
> early. Preloading the array with NUMA_NO_NODE would allow us to do that.
>
> 2. Implement early_cpu_to_node on platforms that currently do not have it.
>
> 3. A series of patches that fix each place where cpu_to_node is used too
> early.
I think step 3 can be simplified with a generic function that sets
per_cpu(numa_node) using early_cpu_to_node(). It can be called right after
setup_per_cpu_areas().
--
Sincerely yours,
Mike.