Re: [bug] SLUB + mm/slab.c boot crash in -rc9

From: Ingo Molnar
Date: Tue Apr 15 2008 - 12:16:30 EST

Next message: Johannes Weiner: "[PATCH v2] mm: Fix possible off-by-one in walk_pte_range()"
Previous message: Linus Torvalds: "Re: [PATCH] Replace completions with semaphores"
In reply to: Linus Torvalds: "Re: [bug] SLUB + mm/slab.c boot crash in -rc9"
Next in thread: Linus Torvalds: "Re: [bug] SLUB + mm/slab.c boot crash in -rc9"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Tue, 15 Apr 2008, Ingo Molnar wrote:
> >
> > debug output is:
> >
> > http://redhat.com/~mingo/misc/log-Thu_Apr_10_10_41_16_CEST_2008.bad.rc9
> >
> > so it's probably the first few page allocations (setup_cpu_cache())
> > going wrong already - suggesting a some fundamental borkage in SLAB?
>
> Well, I think it suggests some fundamental borkage in the page
> allocator.
>
> That first warn-on is from the "alloc_pages_node()" returning NULL at
> bootup. Sure, it could be that the arguments are bogus, but that
> sounds unlikely since none of that is dependent on any kconfig stuff.
>
> The fact that it happens with both SLUB/SLAB makes that even more
> obvious.
>
> Now, you don't have fault injection on, so it can't be that, and your
> debug entry for *z == NULL didn' trigger in alloc_pages, so it's no
> that one either.
>
> However, if __alloc_pages() failed, I would have expected to see the
> "memory allocation failed" printk. Why didn't it? Is
> printk_ratelimit() broken at boot (last_msg start out as zero - maybe
> i should start out as a negative number)?

btw., now with a second full day spent on this regression, i have
figured out a workaround the hard way: increasing SECTION_SIZE_BITS in
include/asm-x86/sparsemem.h from 26 to 27 makes it go away. (i.e. we use
section chunks of 128 MB instead of 64 MB before) I've given up on
analyzing the crash site - it seems rather random and uninformative and
just suggests page allocator borkage.

So this seems like a general sparsemem borkage. PAE uses a shift of 30
due to page->flags shortage (which masks this bug), 64-bit uses 27 which
too probably masks this bug.

Since this is a !NUMA config and !PAE as well, NODES_SHIFT is 0,
ZONES_SHIFT is 2, so the theory of running out of bits in page->flags is
wrong as well.

I also tried a hack to double the size of all sparsemem mem_map
allocations (on the theory of an overflow there) - but it didnt help.

So i think we need to go down further into the page allocator. Perhaps
the buddy bitmaps are wrongly sized somewhere. I'm grasping at straws.

Btw., Mel Gorman has reproduced crashes with my bzImage on his box (and
a hang with my config, using his build), so i think we can eliminate hw
and build environment specialities as a cause.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Johannes Weiner: "[PATCH v2] mm: Fix possible off-by-one in walk_pte_range()"
Previous message: Linus Torvalds: "Re: [PATCH] Replace completions with semaphores"
In reply to: Linus Torvalds: "Re: [bug] SLUB + mm/slab.c boot crash in -rc9"
Next in thread: Linus Torvalds: "Re: [bug] SLUB + mm/slab.c boot crash in -rc9"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]