Re: [PATCH] Fix spurious BUG_ON() in mark_bootmem()

From: Johannes Weiner
Date: Wed Jul 09 2008 - 18:44:39 EST


Hi Lee,

Lee Schermerhorn <Lee.Schermerhorn@xxxxxx> writes:

> Against: 2.6.26-rc8-mm1
>
> Fixes problem introduced by patches:
>
> bootmem-factor-out-the-marking-of-a-pfn-range.patch
> bootmem-replace-node_boot_start-in-struct-bootmem_data.patch
>
> HP ia64 NUMA platform fails to boot 26-rc8-mm1, hitting BUG_ON()
> in mm/bootmem.c:mark_bootmem().
>
> After linking all bootmem chunks, the 'bdata_list' on HP ia64 numa
> platforms looks something like this:
>
> node 4: 0x0-0x8000
> node 0: 0x1c008000-0x1c07ec00
> node 1: 0x1c800000-0x1c87f000
> node 2: 0x1d000000-0x1d07f000
> node 3: 0x1d800000-0x1d87f000
>
> [Node 4 is a pseudo-node generated by the platform firmware to
> contain a configurable amount of zero-based, hardware interleaved
> memory. 0x8000 pages or 512M is the minimum that can be configured.]
>
> First call to mark_bootmem() [from free_bootmem()] called with:
>
> start-end: 0x1c008063-0x1c008262, reserve: 0, flags: 0
>
> I.e, NOT in the first chunk of the list.
>
> However, the "if (pos < bdata->node_min_pfn)" in the loop fails
> to test the start address of the argument range [in 'pos'] against
> the end of the chunk. So, it treats the range as being in the node
> 4 chunk. Second time thru' the loop, pos == 0x8000 is <
> bdata->node_min_pfn and pos != start, so we trip the BUG_ON().
>
> This patch enhances the if condition to skip chunks that do not
> overlap the argument range, allowing 26-rc8-mm1 to boot on this
> platform.
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@xxxxxx>
>
> mm/bootmem.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.26-rc8-mm1/mm/bootmem.c
> ===================================================================
> --- linux-2.6.26-rc8-mm1.orig/mm/bootmem.c 2008-07-09 16:11:23.000000000 -0400
> +++ linux-2.6.26-rc8-mm1/mm/bootmem.c 2008-07-09 16:13:46.000000000 -0400
> @@ -299,7 +299,8 @@ static int __init mark_bootmem(unsigned
> int err;
> unsigned long max;
>
> - if (pos < bdata->node_min_pfn) {
> + if (pos < bdata->node_min_pfn ||
> + pos >= bdata->node_low_pfn) {
> BUG_ON(pos != start);
> continue;
> }

Ah, yeah, this was obviously wrong. Thanks for the fix!

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxxx>

Nice to know that it boots otherwise on such a setup :)

Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/