On Thu, 25 Mar 2010 11:29:25 +0800
AmÃrico Wang <xiyou.wangcong@xxxxxxxxx> wrote:
(Cc'ing linux-mm)Hmm..here is summary of corruption (from log), but no idea.
==
process's address pte pnf->pte->page
00000037b4008000 2bf1e025 -> PG_reserved
00000037b400a000 d900000000 -> bad swap
00000037b400c000 2bfe8025 -> PG_reserved
00000037b400d000 12bfe9025 -> belongs to some other files' page cache
00000037b400e000 ff00000000 -> bad swap
00000037b400f000 5400000000 -> bad swap
...
00000037b4019000 ff00000000 -> bad swap
==
All ptes are on the same pmd 1535b5067.
.
I doubt some kind of buffer overflow bug overwrites page table...
Because ptes for adddress of 00000037b4008000...00000037b400f000 are on head of
a page (used for pmd), some data on page [0x1535b4000..0x1535b5000) caused buffer
overflow and broke page table in [0x1535b5000...0x1535b6000)
Is this bug found from 2.6.28.10 ?
If I investigate this issue, I'll check the owner of page 0x1535b4000 by
crash dump.
Thanks,
-Kame
2010/3/25 Janos Haar <janos.haar@xxxxxxxxxxxx>:
> Dear developers,
>
> This is one of my productive servers, wich suddenly starts to freeze > (crash)
> some weeks before.
> I have done all what i can, (i think) please somebody give to me some
> suggestion:
>
> Mar 24 19:22:28 alfa kernel: BUG: Bad page map in process httpd > pte:2bf1e025
> pmd:1535b5067
> Mar 24 19:22:28 alfa kernel: page:ffffea0000f1b250 > flags:4000000000000404
> count:1 mapcount:-1 mapping:(null) index:0
> Mar 24 19:22:28 alfa kernel: addr:00000037b4008000 vm_flags:08000875
> anon_vma:(null) mapping:ffff88022b5d25a8 index:8
> Mar 24 19:22:28 alfa kernel: vma->vm_ops->fault: > filemap_fault+0x0/0x34d
> Mar 24 19:22:28 alfa kernel: vma->vm_file->f_op->mmap:
> xfs_file_mmap+0x0/0x33
> Mar 24 19:22:28 alfa kernel: Pid: 7512, comm: httpd Not tainted > 2.6.32.10 #2
> Mar 24 19:22:28 alfa kernel: Call Trace:
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c2ea3>] > print_bad_pte+0x210/0x229
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c3c98>] > unmap_vmas+0x44b/0x787
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c81d5>] exit_mmap+0xb0/0x133
> Mar 24 19:22:28 alfa kernel: [<ffffffff81041f83>] mmput+0x48/0xb9
> Mar 24 19:22:28 alfa kernel: [<ffffffff810463b0>] exit_mm+0x105/0x110
> Mar 24 19:22:28 alfa kernel: [<ffffffff81371287>] ?
> tty_audit_exit+0x28/0x85
> Mar 24 19:22:28 alfa kernel: [<ffffffff810477a0>] do_exit+0x1e9/0x6d2
> Mar 24 19:22:28 alfa kernel: [<ffffffff81053c37>] ?
> __dequeue_signal+0xf1/0x127
> Mar 24 19:22:28 alfa kernel: [<ffffffff81047d00>] > do_group_exit+0x77/0xa1
> Mar 24 19:22:28 alfa kernel: [<ffffffff810560f7>]
> get_signal_to_deliver+0x32c/0x37f
> Mar 24 19:22:28 alfa kernel: [<ffffffff8100a484>]
> do_notify_resume+0x90/0x740
> Mar 24 19:22:28 alfa kernel: [<ffffffff8102724b>] ?
> __bad_area_nosemaphore+0x178/0x1a2
> Mar 24 19:22:28 alfa kernel: [<ffffffff810272b9>] ? > __bad_area+0x44/0x4d
> Mar 24 19:22:28 alfa kernel: [<ffffffff8100bba2>] > retint_signal+0x46/0x84
> Mar 24 19:22:28 alfa kernel: Disabling lock debugging due to kernel > taint
> Mar 24 19:22:28 alfa kernel: swap_free: Bad swap file entry 6c800000
> Mar 24 19:22:28 alfa kernel: BUG: Bad page map in process httpd
> pte:d900000000 pmd:1535b5067
> Mar 24 19:22:28 alfa kernel: addr:00000037b400a000 vm_flags:08000875
> anon_vma:(null) mapping:ffff88022b5d25a8 index:a
> Mar 24 19:22:28 alfa kernel: vma->vm_ops->fault: > filemap_fault+0x0/0x34d
> Mar 24 19:22:28 alfa kernel: vma->vm_file->f_op->mmap:
> xfs_file_mmap+0x0/0x33
> Mar 24 19:22:28 alfa kernel: Pid: 7512, comm: httpd Tainted: G B
> 2.6.32.10 #2
> Mar 24 19:22:28 alfa kernel: Call Trace:
> Mar 24 19:22:28 alfa kernel: [<ffffffff81044551>] ? add_taint+0x32/0x3e
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c2ea3>] > print_bad_pte+0x210/0x229
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c3d47>] > unmap_vmas+0x4fa/0x787
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c81d5>] exit_mmap+0xb0/0x133
> Mar 24 19:22:28 alfa kernel: [<ffffffff81041f83>] mmput+0x48/0xb9
> Mar 24 19:22:28 alfa kernel: [<ffffffff810463b0>] exit_mm+0x105/0x110
> Mar 24 19:22:28 alfa kernel: [<ffffffff81371287>] ?
> tty_audit_exit+0x28/0x85
> Mar 24 19:22:28 alfa kernel: [<ffffffff810477a0>] do_exit+0x1e9/0x6d2
> Mar 24 19:22:28 alfa kernel: [<ffffffff81053c37>] ?
> __dequeue_signal+0xf1/0x127
> Mar 24 19:22:28 alfa kernel: [<ffffffff81047d00>] > do_group_exit+0x77/0xa1
> Mar 24 19:22:28 alfa kernel: [<ffffffff810560f7>]
> get_signal_to_deliver+0x32c/0x37f
> Mar 24 19:22:28 alfa kernel: [<ffffffff8100a484>]
> do_notify_resume+0x90/0x740
> Mar 24 19:22:28 alfa kernel: [<ffffffff8102724b>] ?
> __bad_area_nosemaphore+0x178/0x1a2
> Mar 24 19:22:28 alfa kernel: [<ffffffff810272b9>] ? > __bad_area+0x44/0x4d
> Mar 24 19:22:28 alfa kernel: [<ffffffff8100bba2>] > retint_signal+0x46/0x84
> Mar 24 19:22:28 alfa kernel: BUG: Bad page map in process httpd > pte:2bfe8025
> pmd:1535b5067
> Mar 24 19:22:28 alfa kernel: page:ffffea0000f1f7c0 > flags:4000000000000404
> count:1 mapcount:-1 mapping:(null) index:0
> Mar 24 19:22:28 alfa kernel: addr:00000037b400c000 vm_flags:08000875
> anon_vma:(null) mapping:ffff88022b5d25a8 index:c
> Mar 24 19:22:28 alfa kernel: vma->vm_ops->fault: > filemap_fault+0x0/0x34d
> Mar 24 19:22:28 alfa kernel: vma->vm_file->f_op->mmap:
> xfs_file_mmap+0x0/0x33
> Mar 24 19:22:28 alfa kernel: Pid: 7512, comm: httpd Tainted: G B
> 2.6.32.10 #2
> Mar 24 19:22:28 alfa kernel: Call Trace:
> Mar 24 19:22:28 alfa kernel: [<ffffffff81044551>] ? add_taint+0x32/0x3e
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c2ea3>] > print_bad_pte+0x210/0x229
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c3c98>] > unmap_vmas+0x44b/0x787
> Mar 24 19:22:28 alfa kernel: [<ffffffff810c81d5>] exit_mmap+0xb0/0x133
> Mar 24 19:22:28 alfa kernel: [<ffffffff81041f83>] mmput+0x48/0xb9
> Mar 24 19:22:28 alfa kernel: [<ffffffff810463b0>] exit_mm+0x105/0x110
> .....
>
> The entire log is here:
> http://download.netcenter.hu/bughunt/20100324/messages
>
> The actual kernel is 2.6.32.10, but the crash-series started @ > 2.6.28.10.
>
> I have forwarded the tasks to another server, removed this from the > room,
> and the hw survived memtest86 in >7 days continously + i have tested > the
> HDDs one by one with badblocks -vvw, all is good.
> For me looks like this is not a hw problem.
>
> Somebody have any idea?
>
> Thanks a lot,
> Janos Haar
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" > in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/