Re: [PATCH 2/6] mm: handle uninitialized numa nodes gracefully

From: Oscar Salvador
Date: Fri Jan 28 2022 - 01:27:12 EST


On 2022-01-27 09:53, Michal Hocko wrote:
From: Michal Hocko <mhocko@xxxxxxxx>

We have had several reports [1][2][3] that page allocator blows up when
an allocation from a possible node is requested. The underlying reason
is that NODE_DATA for the specific node is not allocated.

NUMA specific initialization is arch specific and it can vary a lot.
E.g. x86 tries to initialize all nodes that have some cpu affinity (see
init_cpu_to_node) but this can be insufficient because the node might be
cpuless for example.

One way to address this problem would be to check for !node_online nodes
when trying to get a zonelist and silently fall back to another node.
That is unfortunately adding a branch into allocator hot path and it
doesn't handle any other potential NODE_DATA users.

This patch takes a different approach (following a lead of [3]) and it
pre allocates pgdat for all possible nodes in an arch indipendent code
- free_area_init. All uninitialized nodes are treated as memoryless
nodes. node_state of the node is not changed because that would lead to
other side effects - e.g. sysfs representation of such a node and from
past discussions [4] it is known that some tools might have problems
digesting that.

Newly allocated pgdat only gets a minimal initialization and the rest of
the work is expected to be done by the memory hotplug - hotadd_new_pgdat
(renamed to hotadd_init_pgdat).

generic_alloc_nodedata is changed to use the memblock allocator because
neither page nor slab allocators are available at the stage when all
pgdats are allocated. Hotplug doesn't allocate pgdat anymore so we can
use the early boot allocator. The only arch specific implementation is
ia64 and that is changed to use the early allocator as well.

Reported-by: Alexey Makhalov <amakhalov@xxxxxxxxxx>
Tested-by: Alexey Makhalov <amakhalov@xxxxxxxxxx>
Reported-by: Nico Pache <npache@xxxxxxxxxx>
Acked-by: Rafael Aquini <raquini@xxxxxxxxxx>
Tested-by: Rafael Aquini <raquini@xxxxxxxxxx>
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>

With the mentioned fixups:

Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>

--
Oscar Salvador
SUSE Labs