Re: [PATCH v3] mm: fix panic in __alloc_pages

From: Michal Hocko
Date: Fri Dec 10 2021 - 04:11:19 EST


On Thu 09-12-21 19:01:03, Alexey Makhalov wrote:
>
>
> > On Dec 9, 2021, at 5:29 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Thu 09-12-21 10:23:52, Alexey Makhalov wrote:
> >>
> >>
> >>> On Dec 9, 2021, at 1:56 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
> >>>
> >>> On Thu 09-12-21 09:28:55, Alexey Makhalov wrote:
> >>>>
> >>>>
> >>>> [ 0.081777] Node 4 uninitialized by the platform. Please report with boot dmesg.
> >>>> [ 0.081790] Initmem setup node 4 [mem 0x0000000000000000-0x0000000000000000]
> >>>> ...
> >>>> [ 0.086441] Node 127 uninitialized by the platform. Please report with boot dmesg.
> >>>> [ 0.086454] Initmem setup node 127 [mem 0x0000000000000000-0x0000000000000000]
> >>>
> >>> Interesting that only those two didn't get a proper arch specific
> >>> initialization. Could you check why? I assume init_cpu_to_node
> >>> doesn't see any CPU pointing at this node. Wondering why that would be
> >>> the case but that can be a bug in the affinity tables.
> >>
> >> My bad shrinking. Not just these 2, but all possible and not present nodes from 4 to 127
> >> are having this message.
> >
> > Does that mean that your possible (but offline) cpus do not set their
> > affinity?
> >
> Hi Michal,
>
> I didn’t quite gut a question here. Do you mean scheduler affinity for offlined/not present CPUs?
> From the patch, this message should be printed for every possible offlined node:
> for_each_node(nid) {
> ...
> if (!node_online(nid)) {
> pr_warn("Node %d uninitialized by the platform. Please report with boot dmesg.\n", nid);

Sure, let me expand on this a bit. X86 initialization code
(init_cpu_to_node) does
for_each_possible_cpu(cpu) {
int node = numa_cpu_node(cpu);

if (node == NUMA_NO_NODE)
continue;

if (!node_online(node))
init_memory_less_node(node);

numa_set_node(cpu, node);
}

which means that a memory less node is not initialized either when
- your offline CPUs are not listed in possible cpus for some
reason
- or they do not have any node affinity (numa_cpu_node is
NUMA_NO_NODE).

Could you check what is the reason in your particular case please?

--
Michal Hocko
SUSE Labs