Re: Fw: Slab coruption and oops with 2.6.1-mm4
From: Gerd Knorr
Date: Tue Jan 20 2004 - 07:01:49 EST
caszonyi@xxxxxxxxxx writes:
> yes
> bug is reproduceable with preempt turned off
Ok. Makes a locking flaw less likely as those tend to trigger with
preemp or smp only.
> MCE: The hardware reports a non fatal, correctable incident occurred on
> CPU 0.
> Bank 1: 9400000000000151
That pretty much looks like it is really a hardware issue.
> > > > Slab corruption: start=c57c2000, len=4096
> > ^^^^^^^^
> > Who is this? Is this allocated by bttv? Or someone else corrupts
> > memory here?
> [ bttv load messages ]
> btcx: riscmem alloc size=2320 [2]
That isn't a fresh booted box, is it? Please reboot the machine after
every oops and before continuing testing. With known-corrupted memory
it can oops basically everythere and those oops reports don't help
much.
> btcx: skips line 0-9999:
> btcx: riscmem free [1]
> vbuf: init user [0x43267008+0x6c000 => 109 pages]
> btcx: riscmem alloc size=3184 [2]
> btcx: riscmem free [1]
> btcx: riscmem alloc size=2320 [2]
> btcx: skips line 0-9999:
> btcx: riscmem free [1]
> vbuf: init user [0x43267008+0x6c000 => 109 pages]
> btcx: riscmem alloc size=3184 [2]
> btcx: riscmem free [1]
That was xawtv I guess? Now transcode starting?
> vbuf: mmap setup: 32 buffers, 2129920 bytes each
> vbuf: mmap c9cfc96c: 422fd000-463fd000 pgoff 00000000 bufs 0-31
> vbuf: init user [0x42505000+0x208000 => 520 pages]
> btcx: riscmem alloc size=7820 [2]
> btcx: riscmem alloc size=7820 [3]
Oh, doesn't print the riscmem addresses. The blocks are two-page
sized through, so the one-page allocation slab complains about above
likely doesn't come from this.
> Unable to handle kernel paging request at virtual address 25262e29
^^^^^^^^
strange value for a kernel address, probably some corrupted pointer.
> EIP is at videobuf_dma_free+0x33/0xc0 [video_buf]
> eax: 00000000 ebx: c45a7000 ecx: 00000208 edx: 25262e29
> esi: 00000000 edi: c817cf54 ebp: d0a35720 esp: c4135c18
in edx. "objdump -Sd video-buf.o" should help finding the instruction
and corrospending source line, but I fear that wouldn't help much as
that isn't the source of the problem but the place where it shows up.
> btcx: riscmem free [64]
> [ ... ]
> btcx: riscmem free [3]
cleanups due to transcode being killed ...
> Unable to handle kernel paging request at virtual address 25262e29
... and here it hits the very same corrupted pointer again.
Gerd
--
"... und auch das ganze Wochenende oll" -- Wetterbericht auf RadioEins
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/