Re: kernel BUG at mm/swap.c:340

From: Nick Piggin
Date: Fri Oct 20 2006 - 11:27:10 EST


Julien Kirmaier wrote:
Dear kernel maintainers,

while processing with gs rather large postscript files (~1Go) the kernel pulled this message in the console:

<<<<<
kernel BUG at mm/swap.c:340!


So it found PageLRU set when it should not have been.

invalid opcode: 0000 [#1]
Modules linked in: parport_serial bsd_comp ppp_deflate zlib_deflate ppp_synctty ppp_generic slhc via_rhine nfs lockd sunrpc smbfs ntfs vfat fat genrtc
CPU: 0
EIP: 0060:[<c01367b2>] Not tainted VLI
EFLAGS: 00010202 (2.6.18 #2) EIP is at __pagevec_release_nonlru+0x26/0x71
eax: ffffffff ebx: c1ad7e24 ecx: 00000007 edx: c158f848
esi: f1d57ca0 edi: f1d57ca0 ebp: c1ad7f2c esp: c1ad7dc0
ds: 007b es: 007b ss: 0068
Process kswapd0 (pid: 79, ti=c1ad6000 task=c19e8570 task.ti=c1ad6000)
Stack: 00000007 00000001 c148c1e0 c15a5c80 c1691b80 c134a800 c1336120 c117c340 c10a5600 00018a8e c10f3040 c10f3040 c0137389 c10f3040 c10f3040 c10f3040 c10f3040 c013764c c1ad7e24 00000001 00000001 0000001c 00000000 c1ad7e1c Call Trace:
[<c0137389>] remove_mapping+0x4a/0x5c
[<c013764c>] shrink_page_list+0x2b1/0x33d
[<c013780c>] shrink_inactive_list+0x78/0x1bb
[<c0137071>] shrink_slab+0x61/0x18c
[<c013718d>] shrink_slab+0x17d/0x18c
[<c0137d37>] shrink_zone+0xa9/0xc6
[<c01380f4>] balance_pgdat+0x1b5/0x2c0
[<c01382f4>] kswapd+0xf5/0xf9

Looks like either the isolated list of pages got messed up, or the page
flag flipped from !PageLRU when we isolated it, to PageLRU when we check
it again here.

If I'm not mistaken, page->flags should be in eax, which which may point
to the list being messed up rather than a flags bitflip.

Either way I'd say you have some memory corruption problems, whether it
is bad memory or a device/driver problem. Hard to help.

You could try running a 2.6.19-rc2 kernel with CONFIG_DEBUG_LIST and
CONFIG_DEBUG_VM turned on and see if that catches anything.

The machine did not halt after this message as it did two weeks ago. I thought at the time that bad RAM was involved so I replaced it; memtest86 did not return anything wrong with the new 1Go RAM module.

I'm running 2.6.18 vanilla kernel on Debian Sarge, raid-1 software, the swap space is also on raid1.

Tell me if you need more informations (or to whom I should address this
message if linux.kernel is not relevant).

linux.kernel is the right place. Thanks.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/