Re: [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mappinggranularity check

From: Tejun Heo
Date: Sat Jul 09 2011 - 04:32:16 EST


On Fri, Jul 01, 2011 at 06:23:27PM +0200, Tejun Heo wrote:
> Both SPARSEMEM and DISCONTIGMEM have limited granularity when mapping
> pfn to nid. If NUMA nodes are laid out such that the mapping cannot
> be accurate, boot will fail triggering BUG_ON() in
> mminit_verify_page_links().
>
> On 32bit, it's 512MiB w/ PAE and SPARSEMEM. This seems to have been
> granular enough until commit 2706a0bf7b (x86, NUMA: Enable
> CONFIG_AMD_NUMA on 32bit too). Apparently, there is a machine which
> aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT. As
> x86_64 has granularity of 128MiB, NUMA config worked fine on the
> machine; however, the commit enabled AMD NUMA config on 32bit too and
> the 512MiB granularity wasn't enough.
>
> On node 0 totalpages: 2096615
> DMA zone: 32 pages used for memmap
> DMA zone: 0 pages reserved
> DMA zone: 3927 pages, LIFO batch:0
> Normal zone: 1740 pages used for memmap
> Normal zone: 220978 pages, LIFO batch:31
> HighMem zone: 16405 pages used for memmap
> HighMem zone: 1853533 pages, LIFO batch:31
> BUG: Int 6: CR2 (null)
> EDI (null) ESI 00000002 EBP 00000002 ESP c1543ecc
> EBX f2400000 EDX 00000006 ECX (null) EAX 00000001
> err (null) EIP c16209aa CS 00000060 flg 00010002
> Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
> (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe (null)
> f7200b80 c16395f0 00200a02 f7200a80 (null) 000375fe 00000002 (null)
> Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
> Call Trace:
> [<c136b1e5>] ? early_fault+0x2e/0x2e
> [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
> [<c1620613>] ? memmap_init_zone+0xaf/0x10c
> [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
> [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
> [<c1601d80>] ? paging_init+0x112/0x118
> [<c15f578d>] ? setup_arch+0x791/0x82f
> [<c15f43d9>] ? start_kernel+0x6a/0x257
>
> This patch implements node_map_pfn_alignment() which determines
> maximum internode alignment and update numa_register_memblks() to
> reject NUMA configuration if alignment exceeds the pfn -> nid mapping
> granularity of the memory model as determined by PAGES_PER_SECTION.
>
> This makes the problematic machine boot w/ flatmem by rejecting the
> NUMA config and provides protection against crazy NUMA configurations.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> LKML-Reference: <20110628174613.GP478@xxxxxxxxxxxxxxxxxxxxx>
> Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@xxxxxxx>
> Cc: Conny Seidel <conny.seidel@xxxxxxx>

Ping? If the change is too invasive at this stage, we can disable AMD
NUMA on x86_32 for 3.0 and queue these two for 3.1.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/