Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
From: Baoquan He
Date: Mon Mar 30 2020 - 04:17:15 EST
On 03/30/20 at 09:42am, Michal Hocko wrote:
> On Sat 28-03-20 11:31:17, Hoan Tran wrote:
> > In NUMA layout which nodes have memory ranges that span across other nodes,
> > the mm driver can detect the memory node id incorrectly.
> >
> > For example, with layout below
> > Node 0 address: 0000 xxxx 0000 xxxx
> > Node 1 address: xxxx 1111 xxxx 1111
> >
> > Note:
> > - Memory from low to high
> > - 0/1: Node id
> > - x: Invalid memory of a node
> >
> > When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
> > config, mm only checks the memory validity but not the node id.
> > Because of that, Node 1 also detects the memory from node 0 as below
> > when it scans from the start address to the end address of node 1.
> >
> > Node 0 address: 0000 xxxx xxxx xxxx
> > Node 1 address: xxxx 1111 1111 1111
> >
> > This layout could occur on any architecture. Most of them enables
> > this config by default with CONFIG_NUMA. This patch, by default, enables
> > CONFIG_NODES_SPAN_OTHER_NODES or uses early_pfn_in_nid() for NUMA.
>
> I am not opposed to this at all. It reduces the config space and that is
> a good thing on its own. The history has shown that meory layout might
> be really wild wrt NUMA. The config is only used for early_pfn_in_nid
> which is clearly an overkill.
>
> Your description doesn't really explain why this is safe though. The
> history of this config is somehow messy, though. Mike has tried
> to remove it a94b3ab7eab4 ("[PATCH] mm: remove arch independent
> NODES_SPAN_OTHER_NODES") just to be reintroduced by 7516795739bd
> ("[PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc") without any
> reasoning what so ever. This doesn't make it really easy see whether
> reasons for reintroduction are still there. Maybe there are some subtle
> dependencies. I do not see any TBH but that might be burried deep in an
> arch specific code.
Yeah, since early_pfnnid_cache was added, we do not need worry about the
performance. But when I read the mem init code on x86 again, I do see there
are codes to handle the node overlapping, e.g in numa_cleanup_meminfo(),
when store node id into memblock. But the thing is if we have
encountered the node overlapping, we just return ahead of time, leave
something uninitialized. I am wondering if the system with node
overlapping can still run heathily.
>
> > v3:
> > * Revise the patch description
> >
> > V2:
> > * Revise the patch description
> >
> > Hoan Tran (5):
> > mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
> > powerpc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
> > x86: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
> > sparc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
> > s390: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
> >
> > arch/powerpc/Kconfig | 9 ---------
> > arch/s390/Kconfig | 8 --------
> > arch/sparc/Kconfig | 9 ---------
> > arch/x86/Kconfig | 9 ---------
> > mm/page_alloc.c | 2 +-
> > 5 files changed, 1 insertion(+), 36 deletions(-)
> >
> > --
> > 1.8.3.1
> >
>
> --
> Michal Hocko
> SUSE Labs
>