Re: PROBLEM: kernel BUG at mm/rmap.c:522!

From: Francesco Ricci
Date: Wed Apr 18 2007 - 06:04:00 EST


>That by itself would suggest a single-bit error, which would point
>you to running memtest86 overnight to check your RAM. Worth a try.

ok,
I've finished today the long testing on my ram with memtest86+ 1.65
(distributed in debian): no errors.
so I think the problem is somewere else...

tnx for your time!

Hugh Dickins <hugh@xxxxxxxxxxx> writes:
>On Fri, 13 Apr 2007, Francesco Ricci wrote:
>>
>> [7.7.] Other information that might be relevant to the problem
>> (please look in /proc and include all information that you
>> think to be relevant):
>
>Thanks for your report, and for your patience in supplying all that
>scarcely relevant info you were asked for.
>
>>
>> from dmesg:
>> Apr 13 11:31:35 localhost kernel: Bad pte = 461b43d4, process = ???,
>> vm_flags = 75, vaddr = b7868000
>
>Oh dear, one of your page tables has got corrupted, the page table
>entry for virtual address b7868000 contains 461b43d4 - nonsense.
>
>> Apr 13 11:31:35 localhost kernel: [<c014b0ea>] vm_normal_page+0x3e/0x53
>> Apr 13 11:31:35 localhost kernel: [<c014b6fa>] unmap_vmas+0x183/0x4af
>> Apr 13 11:31:35 localhost kernel: [<c014de31>] exit_mmap+0x6a/0xd7
>> Apr 13 11:31:35 localhost kernel: [<c011b217>] mmput+0x20/0x76
>> Apr 13 11:31:35 localhost kernel: [<c011fa05>] do_exit+0x193/0x71b
>> Apr 13 11:31:35 localhost kernel: [<c0120003>] sys_exit_group+0x0/0xd
>> Apr 13 11:31:35 localhost kernel: [<c0127a6d>]
>> get_signal_to_deliver+0x395/0x3bc
>> Apr 13 11:31:35 localhost kernel: [<c01023a6>]
>do_notify_resume+0x71/0x5d7
>> Apr 13 11:31:35 localhost kernel: [<c0117778>]
>> default_wake_function+0x0/0xc
>> Apr 13 11:31:35 localhost kernel: [<c0124657>]
>do_gettimeofday+0x31/0xce
>> Apr 13 11:31:35 localhost kernel: [<c0131ffc>] sys_futex+0xdc/0xf1
>> Apr 13 11:31:35 localhost kernel: [<c0102d0a>] work_notifysig+0x13/0x19
>> Apr 13 11:31:35 localhost kernel: ------------[ cut here ]------------
>> Apr 13 11:31:35 localhost kernel: kernel BUG at mm/rmap.c:522!
>> Apr 13 11:31:35 localhost kernel: invalid opcode: 0000 [#1]
>> Apr 13 11:31:35 localhost kernel: SMP
>> Apr 13 11:31:35 localhost kernel: Modules linked in: smbfs ext3 jbd
>> mbcache mga drm ppdev lp button ac battery ipv6 fuse dm_snapshot
>dm_mirror
>> dm_mod loop tsdev snd_via82xx gameport snd_ac97_codec snd_ac97_bus
>> snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_mpu401_uart
>> snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq
>> i2c_viapro i2c_core snd_timer snd_rawmidi snd_seq_device via_agp
>> parport_pc parport via_ircc psmouse serio_raw floppy snd soundcore
>pcspkr
>> rtc agpgart shpchp pci_hotplug irda crc_ccitt evdev reiserfs ide_cd
>cdrom
>> ide_disk generic via_rhine mii ehci_hcd uhci_hcd via82cxxx ide_core
>> usbcore thermal processor fan
>> Apr 13 11:31:35 localhost kernel: CPU: 0
>> Apr 13 11:31:35 localhost kernel: EIP: 0060:[<c01506c9>] Not
>tainted
>> VLI
>> Apr 13 11:31:35 localhost kernel: EFLAGS: 00210286 (2.6.18-4-686 #1)
>> Apr 13 11:31:35 localhost kernel: EIP is at page_remove_rmap+0x14/0x2d
>> Apr 13 11:31:35 localhost kernel: eax: ffffffff ebx: c1000000 ecx:
>> c1000000 edx: 00000000
>> Apr 13 11:31:35 localhost kernel: esi: b7869000 edi: 00000000 ebp:
>> d08011a4 esp: c9be9e14
>> Apr 13 11:31:35 localhost kernel: ds: 007b es: 007b ss: 0068
>> Apr 13 11:31:35 localhost kernel: Process iceape-bin (pid: 20500,
>> ti=c9be8000 task=c2128aa0 task.ti=c9be8000)
>> Apr 13 11:31:35 localhost kernel: Stack: c014b7d5 00000000 df10d278
>> c9be9e7c 00000000 00000001 b7884000 c4bc7b78
>> Apr 13 11:31:35 localhost kernel: e80fbe40 c16058a0 00000000
>> ffffffe2 c121002c c4bc7b78 0011ed07 b7884000
>> Apr 13 11:31:35 localhost kernel: 00000000 c9be9e7c df4c7710
>> e80fbe40 c9be9eb8 c014de31 ffffffff c9be9e78
>
>And the next page table entry, for virtual address b7869000, has also
>got corrupted: I'm rather guessing, but I believe the c1000000 implies
>it's looking at pfn 0, so the page table entry in question would be
>that 00000001 seen on the stack (1 for present).
>
>That by itself would suggest a single-bit error, which would point
>you to running memtest86 overnight to check your RAM. Worth a try.
>
>But the 461b43d4 before it suggests corruption from elsewhere in
>the kernel: sorry, I've no clue on that. Just wait and see if this
>happens again, and whether a pattern emerges - unless someone else
>can suggest something better.
>
>Hugh
>
>> Apr 13 11:31:35 localhost kernel: Call Trace:
>> Apr 13 11:31:35 localhost kernel: [<c014b7d5>] unmap_vmas+0x25e/0x4af
>> Apr 13 11:31:35 localhost kernel: [<c014de31>] exit_mmap+0x6a/0xd7
>> Apr 13 11:31:35 localhost kernel: [<c011b217>] mmput+0x20/0x76
>> Apr 13 11:31:35 localhost kernel: [<c011fa05>] do_exit+0x193/0x71b
>> Apr 13 11:31:35 localhost kernel: [<c0120003>] sys_exit_group+0x0/0xd
>> Apr 13 11:31:35 localhost kernel: [<c0127a6d>]
>> get_signal_to_deliver+0x395/0x3bc
>> Apr 13 11:31:35 localhost kernel: [<c01023a6>]
>do_notify_resume+0x71/0x5d7
>> Apr 13 11:31:35 localhost kernel: [<c0117778>]
>> default_wake_function+0x0/0xc
>> Apr 13 11:31:35 localhost kernel: [<c0124657>]
>do_gettimeofday+0x31/0xce
>> Apr 13 11:31:35 localhost kernel: [<c0131ffc>] sys_futex+0xdc/0xf1
>> Apr 13 11:31:35 localhost kernel: [<c0102d0a>] work_notifysig+0x13/0x19
>> Apr 13 11:31:35 localhost kernel: Code: ff ff 85 c0 89 c6 75 c9 b0 01 86
>> 43 28 83 c4 20 89 e8 5b 5e 5f 5d c3 89 c1 90 83 40 08 ff 0f 98 c0 84 c0
>74
>> 1e 8b 41 08 40 79 08 <0f> 0b 0a 02 40 ab 29 c0 8b 51 10 89 c8 83 f2 01
>83
>> e2 01 e9 97
>> Apr 13 11:31:35 localhost kernel: EIP: [<c01506c9>]
>> page_remove_rmap+0x14/0x2d SS:ESP 0068:c9be9e14
>> Apr 13 11:31:35 localhost kernel: <1>Fixing recursive fault but reboot
>is
>> needed!


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/