Re: [PATCH] mm: fix panic in __alloc_pages
From: David Hildenbrand
Date: Tue Nov 02 2021 - 05:24:50 EST
>>> In add_memory_resource() we hotplug the new node if required and set it
>>> online. Memory might get onlined later, via online_pages().
>>
>> You are correct. In case of memory hot add, it is true. But in case of adding
>> CPU with memoryless node, try_node_online() will be called only during CPU
>> onlining, see cpu_up().
>>
>> Is there any reason why try_online_node() resides in cpu_up() and not in add_cpu()?
>> I think it would be correct to online node during the CPU hot add to align with
>> memory hot add.
>
> I am not familiar with cpu hotplug, but this doesn't seem to be anything
> new so how come this became problem only now?
So IIUC, the issue is that we have a node
a) That has no memory
b) That is offline
This node will get onlined when onlining the CPU as Alexey says. Yet we
have some code that stumbles over the node and goes ahead trying to use
the pgdat -- that code is broken.
If we take a look at build_zonelists() we indeed skip any
!node_online(node). Any other code should do the same. If the node is
not online, it shall be ignored because we might not even have a pgdat
yet -- see hotadd_new_pgdat(). Without node_online(), the pgdat might be
stale or non-existant.
The node onlining logic when onlining a CPU sounds bogus as well: Let's
take a look at try_offline_node(). It checks that:
1) That no memory is *present*
2) That no CPU is *present*
We should online the node when adding the CPU ("present"), not when
onlining the CPU.
--
Thanks,
David / dhildenb