Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!

From: Christoph Lameter
Date: Fri Oct 20 2006 - 13:35:23 EST


On Fri, 20 Oct 2006, Paul Mackerras wrote:

> What is happening is that all pages are getting their zone id field in
> their page->flags set to point to zone for node 1 by memmap_init_zone
> calling set_page_links (which does set_page_zone). Thus, when those
> pages get freed by free_all_bootmem_node, they all end up in the zone
> for node 1.

Ok. So no memory on node 0? Then my patch to reenable fallback in the slab
should have worked but it did not. Could you retest with the patch that
Will tried? If that is not working the comment out the 3 lines with
__GFP_THISNODE in get_page_from_freelist. That will reenable fallback
globally. If that does not work then I doubt that this is my issue.

> memmap_init_zone is called (as memmap_init, since we don't have
> __HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
> is called from free_area_init_core. Now the thing is that memmap_init
> and init_currently_empty_zone are called with the node's start PFN and
> size in pages, *including* holes. On the partition I'm using we have
> these PFN ranges for the nodes:
>
> 1: 0 -> 32768
> 0: 32768 -> 278528
> 1: 278528 -> 524288
>
> So node 0's start PFN is 32768 and its size is 245760 pages, and so we
> correctly set pages 32786 to 278527 to be in the zone for node 0.
> Then for node 1, we have the start PFN is 0 and the size is 524288, so
> we then go through and set *all* pages of memory to be in the zone for
> node 1, including the pages which are actually on node 0.

I do not get it. You first mark all pages on node 0 then we run the bootup
code and later we shift those pages into node 0? So the slab bootstrap is
running when all pages are marked as being part of node 1 then later we
switch those pages under it to node 0?

> I don't know this code well enough to know what the correct fix is.
> Clearly memmap_init_zone should only be touching the pages that are
> actually present in the zone, but I don't know exactly what data
> structures it should be using to know what those pages are.

The fix that I posted yesterday should have reenabled fallback in the
slab during bootstrap and should have made the system work. Here it is
again:

Index: linux-2.6.19-rc2-mm1/mm/slab.c
===================================================================
--- linux-2.6.19-rc2-mm1.orig/mm/slab.c 2006-10-19 11:54:24.000000000 -0500
+++ linux-2.6.19-rc2-mm1/mm/slab.c 2006-10-19 16:32:09.454825851 -0500
@@ -1589,7 +1589,10 @@ static void *kmem_getpages(struct kmem_c
* the needed fallback ourselves since we want to serve from our
* per node object lists first for other nodes.
*/
- flags |= cachep->gfpflags | GFP_THISNODE;
+ if (g_cpucache_up != FULL)
+ flags |= cachep->gfpflags & ~__GFP_THISNODE;
+ else
+ flags |= cachep->gfpflags | GFP_THISNODE;

page = alloc_pages_node(nodeid, flags, cachep->gfporder);
if (!page)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/