Re: PROBLEM: kernel BUG at mm/page_alloc.c:775

From: Mel Gorman
Date: Mon Feb 01 2010 - 05:29:53 EST


On Fri, Jan 29, 2010 at 11:01:57PM +0100, Michail Bachmann wrote:
> > > On Tue, Jan 12, 2010 at 03:25:23PM -0600, Christoph Lameter wrote:
> > > > On Sat, 9 Jan 2010, Michail Bachmann wrote:
> > > > > [ 48.505381] kernel BUG at mm/page_alloc.c:775!
> > > >
> > > > Somehow nodes got mixed up or the lookup tables for pages / zones are
> > > > not giving the right node numbers.
> > >
> > > Agreed. On this type of machine, I'm not sure how that could happen
> > > short of struct page information being corrupted. The range should
> > > always be aligned to a pageblock boundary and I cannot see how that
> > > would cross a zone boundary on this machine.
> > >
> > > Does this machine pass memtest?
> >
> > I ran one pass with memtest86 without errors before posting this bug, but I
> > can let it run "all tests" for a while just to be sure it is not caused by
> > broken hw.
>
> Please disregard this bug report. After running memtest for more than 10 hours
> it found a memory error.

I'm sorry to hear it but at least the source of the bug is known.

> The funny thing is, linux found it much faster...
>

It could be that your power supply is slightly too inefficient and the
errors only occur when all cores are active or all disks - something
Linux might do easily where as memtest does not necessarily stress the
machine enough for the power drop to happen.

> Thanks for your time.
>

Thanks for testing and getting back to us.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/