Re: Fw: 2.6.17 oops, possibly ntfs/mmap related

From: Anton Altaparmakov
Date: Thu Sep 21 2006 - 05:56:04 EST


Hi,

On Tue, 2006-09-12 at 20:56 -0700, Andrew Morton wrote:

Andrew, thanks for forwarding me the message...

> Begin forwarded message:
>
> Date: Wed, 13 Sep 2006 10:05:42 +0930 (CST)
> From: Jonathan Woithe <jwoithe@xxxxxxxxxxxxxxxxxxxxxxx>
> To: linux-kernel@xxxxxxxxxxxxxxx
> Cc: jwoithe@xxxxxxxxxxxxxxxxxxxxxxx (Jonathan Woithe)
> Subject: 2.6.17 oops, possibly ntfs/mmap related
>
>
> We have a machine which is currently making heavy use of a usb hard disc
> formatted with ntfs. There have been two occasions where the kernel has
> oopsed while this disc was being accessed heavily. Before adding this HDD
> the machine in question was rock solid which leads me to think that it
> might be related to ntfs. USB drives formatted with other filesystems do
> not appear to suffer from this problem.
>
> Unfortunately bogofilter considers the oops reports as spam so I cannot post
> them to the list. I have instead put the full text of my original post
> regarding this topic on the web at
>
> http://www.atrad.com.au/~jwoithe/kernel/oopses-20060913.txt
>
> I'm happy to try things to narrow down the cause if it will help.
>
> Please CC me on reply.

These were the oopses from above text url:

> The first oops caused the machine to totally lock up:
>
> BUG: unable to handle kernel paging request at virtual address e4004de0
> printing eip:
> c012de0c
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore
> CPU: 0
> EIP: 0060:[<c012de0c>] Not tainted VLI
> EFLAGS: 00010082 (2.6.17 #2)
> EIP is at find_get_page+0x11/0x22
> eax: e4004de0 ebx: c02f01a8 ecx: e4004de0 edx: e4004de0
> esi: 00000000 edi: 00000066 ebp: cfc20574 esp: c770bee8
> ds: 007b es: 007b ss: 0068
> Process sh (pid: 10467, threadinfo=c770a000 task=cfa495c0)
> Stack: c012ea09 00000002 00000000 cfc204d8 cff882a4 cff88260 c770bf30 c5962544
> c02f01a8 00000000 ceec10a0 080aef10 c01385d1 00000000 cfc20574 080aef10
> c5962544 ceec10a0 00000002 c7aeb080 080aef10 ceec10a0 080aef10 c013886a
> Call Trace:
> <c012ea09> filemap_nopage+0x98/0x2b2 <c01385d1> do_no_page+0x6d/0x1e1
> <c013886a> __handle_mm_fault+0xc4/0x162 <c0112190> do_page_fault+0x23e/0x56b
> <c01c43c1> copy_to_user+0x41/0x49 <c0111f52> do_page_fault+0x0/0x56b
> <c010342f> error_code+0x4f/0x54
> Code: a0 fe ff ff 89 ea b9 e2 d7 12 c0 6a 02 e8 5f ec 15 00 83 c4 44 5b 5e 5f 5d c3 fa 83 c0 04 e8 2c 3f 09 00 85 c0 89 c1 74 0f 89 c2 <8b> 00 f6 c4 40 74 03 8b 51 0c ff 42 04 fb 89 c8 c3 fa 83 c0 04
>
>
> In the case of the second oops the machine was still partially usable and a
> clean shutdown was possible. However, services such as sshd were no longer
> responding.
>
> BUG: unable to handle kernel paging request at virtual address 0010c744
> printing eip:
> c013be50
> *pde = 00000000
> Oops: 0002 [#1]
> Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore
> CPU: 0
> EIP: 0060:[<c013be50>] Tainted: G M VLI
> EFLAGS: 00010282 (2.6.17 #2)
> EIP is at anon_vma_unlink+0x16/0x3c
> eax: 0010c740 ebx: cf1070cc ecx: cf107104 edx: cf8bc740
> esi: cf8bc740 edi: b7e82000 ebp: 00000000 esp: cdad7f58
> ds: 007b es: 007b ss: 0068
> Process sh (pid: 20272, threadinfo=cdad6000 task=c0d8d580)
> Stack: cf1070cc cf61f3e4 c0136b5f cdad7f80 c4084b74 cf8b5860 00000001 00000000
> c013ab92 00000000 c0371b7c 000000b9 cf8b5860 c0d8d580 c01145dd cdad6000
> c0118187 cdad6000 00000000 00000000 cdad6000 c0118380 00000000 b7f9968c
> Call Trace:
> <c0136b5f> free_pgtables+0x41/0x82 <c013ab92> exit_mmap+0x6a/0xb8
> <c01145dd> mmput+0x1b/0x5e <c0118187> do_exit+0x14e/0x2d1
> <c0118380> sys_exit_group+0x0/0xd <c010299b> syscall_call+0x7/0xb
> Code: c9 74 10 8b 11 8d 40 38 89 42 04 89 53 38 89 48 04 89 01 5b c3 56 53 8b 70 40 89 c3 85 f6 74 2e 8d 48 38 8b 40 38 8b 51 04 89 02 <89> 50 04 c7 43 38 00 01 10 00 39 36 c7 41 04 00 02 20 00 75 0e
> EIP: [<c013be50>] anon_vma_unlink+0x16/0x3c SS:ESP 0068:cdad7f58
> <1>Fixing recursive fault but reboot is needed!
>
> I'm not entirely sure why the kernel considered itself tainted in the second
> oops and not in the first - the setup hadn't changed and precisely the same
> kernel modules were loaded. This machine does not have any external (ie:
> out-of-tree) modules installed.

Weird. The traces do not include NTFS at all in them so I have no idea
if NTFS has anything to do with this or not...

Anyone have any idea what these oopses mean?!?

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/