Re: 2.6.24 regression: pan hanging unkilleable and un-straceable

From: Nick Piggin
Date: Mon Feb 04 2008 - 18:03:20 EST


On Tuesday 05 February 2008 01:49, Mike Galbraith wrote:
> On Tue, 2008-01-22 at 06:47 +0100, Mike Galbraith wrote:
> > On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote:
> > > On Tuesday 22 January 2008 16:03, Mike Galbraith wrote:
> > > > I've hit same twice recently (not pan, and not repeatable).
> > >
> > > Nasty. The attached patch is something really simple that can sometimes
> > > help. sysrq+p is also an option, if you're on a UP system.
> >
> > SMP (P4/HT imitating real cores)
> >
> > > Any luck getting traces?
> >
> > We'll see. Armed.
>
> Hm. ld just went loopy (but killable) in v2.6.24-6928-g9135f19. During
> kbuild, modpost segfaulted, restart build, ld goes gaga. Third attempt,
> build finished. Not what I hit before, but mentionable.
>
>
> [ 674.589134] modpost[18588]: segfault at 3e8dc42c ip 0804a96d sp af982920
> error 5 in modpost[8048000+9000] [ 674.589211] mm/memory.c:115: bad pgd
> 3e081163.
> [ 674.589214] mm/memory.c:115: bad pgd 3e0d2163.
> [ 674.589217] mm/memory.c:115: bad pgd 3eb01163.

Hmm, this _could_ be bad memory. Or if it is very easy to reproduce with
a particular kernel version, then it is probably a memory scribble from
another part of the kernel :(

First thing I guess would be easy and helpful to run memtest86 for a
while if you have time.

If that's clean, then I don't have another good option except to bisect
the problem. Turning on DEBUG_VM, DEBUG_SLAB, DEBUG_LIST, DEBUG_PAGEALLOC,
DEBUG_STACKOVERFLOW, DEBUG_RODATA might help catch it sooner... SLAB and
PAGEALLOC could slow you down quite a bit though. And if the problem is
quite reproduceable, then obviously don't touch your config ;)

Thanks,
Nick


>
> [ 1407.322144] =======================
> [ 1407.322144] ld R running 0 21963 21962
> [ 1407.322144] db9d7f1c 00200086 c75f9020 b1814300 b0428300 b0428300
> b0428300 c75f9280 [ 1407.322144] b1814300 00000001 db9d7000 00000000
> d08c2f90 dba4f300 00000002 00000000 [ 1407.322144] b1810120 dba4f334
> 00200046 ffffffff db9d7000 c75f9020 db9d7f30 b02f333f [ 1407.322144] Call
> Trace:
> [ 1407.322144] [<b02f333f>] preempt_schedule_irq+0x45/0x5b
> [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470
> [ 1407.322144] [<b0104d6e>] need_resched+0x1f/0x21
> [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470
> [ 1407.322144] [<b0117a5c>] ? do_page_fault+0x4c/0x470
> [ 1407.322144] [<b0117a10>] ? do_page_fault+0x0/0x470
> [ 1407.322144] [<b02f4a3a>] ? error_code+0x72/0x78
> [ 1407.322144] [<b02f0000>] ? init_transmeta+0xcf/0x22f <== zzt P4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/