Re: [bug] SLUB + mm/slab.c boot crash in -rc9

From: Ingo Molnar
Date: Tue Apr 15 2008 - 16:18:18 EST



* Christoph Lameter <clameter@xxxxxxx> wrote:

> > Pretty please, could you pay more than cursory attention to this bug
> > i already spent two full days on and which is blocking the v2.6.25
> > release?
>
> Yeah trying to get to understand how exactly sparsemem works and how
> the 32 bit highmem stuff interacts with it... Sorry not code that I am
> an expert in nor the platform that I am familiar with. Code mods there
> required heavy review from multiple parties with expertise in various
> subjects.

yeah - sorry about that impatient flame. And it could still be anything
from the page allocator to bootmem - or some completely unrelated piece
of code corrupting some key data structure.

sparsemem is supposed to work roughly like this on x86 (32-bit):

- the x86 memory map comes from the bios via e820.

- those individual chunks of e820-enumerated memory get
registered with mm/sparse.c's data structures via memory_present()
callbacks. [btw., this should be renamed to register_memory_present()
or register_sparse_range() - something less opaque.]

- there's really just 3 RAM areas that matter on this box, and the last
one is unusable for !PAE, which leaves 2.

- there's a 256 MB PCI aperture hole at 0xf0000000.

- out of the 64 sparse memory chunk the first 60 get filled in (all have
at least partially some RAM content) - the last 4 [the PCI aperture
hole] remains !present.

- we pass in an array of 3 zones to free_area_init_nodes().

- we free the lowmem pages into the buddy allocator via the usual
generic setup

- we have a special loop for highmem pages in arch/x86/mm/init_32.c,
set_highmem_pages_init(). This just goes through the PFNs one by one
and does an explicit __free_page() on all RAM pages that are in the
mem_map[] and which are non-reserved.

and that's it roughly.

my current guess would have been some bootmem regression/interaction
that messes up the buddy bitmaps - but i just reverted to the v2.6.24
version of bootmem.c and that crashes too ...

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/