Re: Oops report 2.2.12 w/ikd patch

Nicholas R LeRoy (nick.leroy@norland.com)
Fri, 24 Sep 1999 08:36:40 -0500


Alan & folks..

On Sep 20, 7:21pm, Alan Cox wrote:
> Subject: Re: Oops report 2.2.12 w/ikd patch
> > > Have you tried running other large memory hogging applications instead of
> > > X ?
> >
> > Yeah... I wrote program which allocates large chunks of memory and pages
> > through the buffers to keeps circulating buffers to swap. Ran straight
> > for 48+ hours without a hitch, but the system did page A LOT.
>
> Ok
>
> > > My first suspicion is the Xserver. Its privileged enough to crash the
box.
> > > Can you duplicate the problem if you run X with acceleration disabled ?
> >
> > I've tried several different X servers.. I've run xsvga, xs3 and xs3v.
> > Do you think that running w/o accelleration would really make a difference?
>
> Its worth trying
>
> > the fact that I've tried different servers AND different hardware to me
> > *indicates* that the problem is elsewhere, but I suppose that they all
> > share a large chunk of code, too.
>
> They do share a lot of code. Its also 1Mb of material I'd like to eliminate
> from enquiries if possible 8)
>
> > on several occasions. If you need me to do something to help diagnose,
> > I'd be more than willing. :-)
>
> The only other one I can think of is to hack your x setup so that you
> use localhost:0 for all the X clients - ie so X isnt using unix domain
> sockets

Tried this. Still crashed. See oops #1 below.

A bit more info for 'ya. I got two oops's last night. I haven't been
able to build 2.2.13pre11 yet, but will do that this weekend.
This is still all with 2.2.12 w/IKD patch.

Below are my notes files and my traces. #2 is a completely new oops
for me. It oops'ed while fsck'ing at bootup, something I've never
seen before.

-Nick

-------------------------------------------------------------------------
#1:

# cat oops.12.notes
1 - DISPLAY=localhost:0
It died anyway (see oops).

2 - After the initial oops, I used gpm to cut the oops from one virtual
console and paste into a file (cat > oops.12 {paste}^d). I then
tried to run ksymoops. I typed
'ksymoops < oops.12 > oops.12{tab}' (trying to let bash fill in the
name oops.12.trace -- which didn't exist). The console quit
responding for some time. I could switch consoles via alt-f-n, but
that console was not responding. I tried to login on another console:
typed 'root{enter}', but did not get prompted for a password. I switched
back to the the ksymoops console and tried '^Q', etc to free it up,
when the second oops was reported. I was then able to run ksymoops
as normal, and the system seems to be behaving normally (as it does
after X/IKD oops's). I have to reboot to run X again. The X server
process is dead, but it's still holding open the port, etc.

# cat oops.12.trace
Options used: -V (default)
-o /lib/modules/2.2.12/ (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-m /usr/src/linux/System.map (default)
-c 1 (default)

You did not tell me where to find symbol information. I will assume
that the log matches the kernel and modules that are running right now
and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Oops: 0002
CPU: 0
EIP: 0010:[<c013e598>]
EFLAGS: 00010206
eax: 00000001 ebx: c17aa000 ecx: 00000005 edx: 00000036
esi: c16c803c edi: c16c8000 ebp: c17abefc esp: c17abefc
ds: 0018 es: 0018 ss: 0018
Process X (pid: 1130, process nr: 8, stackpage=c17ab000)
Stack: c17abf08 c0128d4e c16c8040 c17abf24 c0131c64 c15591a0 00000000 00001000
c16c8000 00000287 c17abf58 c0131fe6 c16c8000 0000009f c317d610 c317d640
00000104 0000000d c17aa000 7fffffff 00000001 00000000 c16c8000 c17abfbc
Call Trace: [<c0128d4e>] [<c0131c64>] [<c0131fe6>] [<c01323ba>] [<c01089b1>]
Code: c6 05 00 00 00 00 00 8b 5d fc 89 ec 5d c3 89 f6 55 89 e5 53

>>EIP: c013e598 <mcount+a0/b0>
Trace: c0128d4e <fput+e/54>
Trace: c0131c64 <free_wait+60/88>
Trace: c0131fe6 <do_select+1f6/210>
Trace: c01323ba <sys_select+3ba/4e8>
Trace: c01089b1 <system_call+41/50>
Code: c013e598 <mcount+a0/b0> 00000000 <_EIP>: <===
Code: c013e598 <mcount+a0/b0> 0: c6 05 00 00 00 00
movb $0x0,0x0 <===
Code: c013e59e <mcount+a6/b0> 6: 00
Code: c013e59f <mcount+a7/b0> 7: 8b 5d fc
movl 0xfffffffc(%ebp),%ebx
Code: c013e5a2 <mcount+aa/b0> a: 89 ec
movl %ebp,%esp
Code: c013e5a4 <mcount+ac/b0> c: 5d
popl %ebp
Code: c013e5a5 <mcount+ad/b0> d: c3
ret
Code: c013e5a6 <mcount+ae/b0> e: 89 f6
movl %esi,%esi
Code: c013e5a8 <mcount_internal+0/84> 10: 55
pushl %ebp
Code: c013e5a9 <mcount_internal+1/84> 11: 89 e5
movl %esp,%ebp
Code: c013e5ab <mcount_internal+3/84> 13: 53
pushl %ebx

Deadlock threshold exceeded, forcing Oops.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c013e598>]
EFLAGS: 00010246
eax: 00000001 ebx: c5fd6000 ecx: 00000005 edx: 00002040
esi: c5c5f63c edi: c5c5f600 ebp: c5fd7f84 esp: c5fd7f84
ds: 0018 es: 0018 ss: 0018
Process kupdate (pid: 3, process nr: 3, stackpage=c5fd7000)
Stack: c5fd7f8c c0142ab1 c5fd7fac c0134ff1 c1a78880 c5fd6000 c01b0b6b c5fd61c2
c5c5f600 00006000 c5fd7fd8 c012aae1 00000000 00000000 c5fd6000 c01b0b6b
c5fd61c2 000ef280 c5fd6000 c0112328 c5fd7fec c5fd7fec c012aff0 00000f00
Call Trace: [<c0142ab1>] [<c0134ff1>] [<c01b0b6b>] [<c012aae1>] [<c01b0b6b>]
[<c0112328>] [<c012aff0>]
[<c0106000>] [<c0107415>]
Code: c6 05 00 00 00 00 00 8b 5d fc 89 ec 5d c3 89 f6 55 89 e5 53

>>EIP: c013e598 <mcount+a0/b0>
Trace: c0142ab1 <ext2_write_inode+d/1c>
Trace: c0134ff1 <sync_inodes+b1/f8>
Trace: c01b0b6b <tvecs+2cab/32e0>
Trace: c012aae1 <sync_old_buffers+21/1bc>
Trace: c01b0b6b <tvecs+2cab/32e0>
Trace: c0112328 <process_timeout+0/1c>
Trace: c012aff0 <kupdate+80/88>
Trace: c0106000 <empty_zero_page+1000/1004>
Code: c013e598 <mcount+a0/b0> 00000000 <_EIP>: <===
Code: c013e598 <mcount+a0/b0> 0: c6 05 00 00 00 00
movb $0x0,0x0 <===
Code: c013e59e <mcount+a6/b0> 6: 00
Code: c013e59f <mcount+a7/b0> 7: 8b 5d fc
movl 0xfffffffc(%ebp),%ebx
Code: c013e5a2 <mcount+aa/b0> a: 89 ec
movl %ebp,%esp
Code: c013e5a4 <mcount+ac/b0> c: 5d
popl %ebp
Code: c013e5a5 <mcount+ad/b0> d: c3
ret
Code: c013e5a6 <mcount+ae/b0> e: 89 f6
movl %esi,%esi
Code: c013e5a8 <mcount_internal+0/84> 10: 55
pushl %ebp
Code: c013e5a9 <mcount_internal+1/84> 11: 89 e5
movl %esp,%ebp
Code: c013e5ab <mcount_internal+3/84> 13: 53
pushl %ebx

1 warning issued. Results may not be reliable.

------------------------------------------------------------------------------
#2:

# cat oops.13.notes
This oops occurred during a boot after #12. #12 left the system in
a state such that the I was not able to sync/unmount the file systems
successfully. The system was performing its fsck's.
The last message before the oops was that it was checking /dev/hda9.

After the oops, it appeared that the system finished all fsck's, but
then never continued after that. The system was quite idle when I
found it -- I had been busy doing something else for a while.

The alt-sysreq keys were functioning, but other than that the system
was completely dead. I used alt-sysreq-b to reboot.

Upon booting, it forced a check of /dev/hda11, which apparently hadn't
completed checking for some reason (certainly related to the oops).

Note also that I had to copy this oops by hand to paper and then type
it back in, so there might be a typeo or two, but I was pretty careful
and double-checked it all, too.

# cat oops.13.trace
Options used: -V (default)
-o /lib/modules/2.2.12/ (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-m /usr/src/linux/System.map (default)
-c 1 (default)

You did not tell me where to find symbol information. I will assume
that the log matches the kernel and modules that are running right now
and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000014
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c012a782>]
EFLAGS: 00010203
eax: 00000000 ebx: c0264c10 ecx: 0000251a edx: c0264c10
esi: 00000000 edi: c25006e0 ebp: c5fd3fa0 esp: c5fd3f9c
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, process nr: 5, stackpage=c5fd3000)
Stack: 00000030 c5fd3fb4 c011f2ae c0264c10 00000002 00000006 c5fd3fd0 c01241c6
00000006 00000030 00000000 c5fd2000 c5fd21c1 c5fd3fec c012430b 00000030
00000f00 c5febfc0 c0106000 c5fd2000 c5febfc8 c0107415 00000000 c0000f00
Call Trace: [<c011f2ae>] [<c01241c6>] [<c012430b>] [<c0106000>] [<c0107415>]
Code: 8b 76 14 83 78 20 75 06 f6 40 18 46 74 0b 6a 00 e8 a0 02

>>EIP: c012a782 <try_to_free_buffers+1a/8c>
Trace: c011f2ae <shrink_mmap+ea/140>
Trace: c01241c6 <do_try_to_free_pages+32/88>
Trace: c012430b <kswapd+6b/e4>
Trace: c0106000 <empty_zero_page+1000/1004>
Trace: c0107415 <kernel_thread+2d/40>
Code: c012a782 <try_to_free_buffers+1a/8c> 00000000 <_EIP>: <===
Code: c012a782 <try_to_free_buffers+1a/8c> 0: 8b 76 14
movl 0x14(%esi),%esi <===
Code: c012a785 <try_to_free_buffers+1d/8c> 3: 83 78 20 75
cmpl $0x75,0x20(%eax)
Code: c012a789 <try_to_free_buffers+21/8c> 7: 06
pushl %es
Code: c012a78a <try_to_free_buffers+22/8c> 8: f6 40 18 46
testb $0x46,0x18(%eax)
Code: c012a78e <try_to_free_buffers+26/8c> c: 74 0b
je c012a79b <try_to_free_buffers+33/8c>
Code: c012a790 <try_to_free_buffers+28/8c> e: 6a 00
pushl $0x0
Code: c012a792 <try_to_free_buffers+2a/8c> 10: e8 a0 02 00 00
call c012aa37 <buffer_init+12f/130>

1 warning issued. Results may not be reliable.

--
+-------------------------------+--------------------------------------------+
| /`--_   Nicholas R LeRoy      | In a world without fences, Who needs Gates?|
|{     }/ Norland Corporation   |        ---- Experience Linux! ----         |
| \ *  / W6340 Hackbarth Rd     | http://www.linux.org | http://www.ssc.com  |
| |___| Fort Atkinson, WI 53538 +--------------------------------------------+
|      nick.leroy@norland.com   | #include <disclaimer.h>                    |
|http://www3.norland.com/~nleroy| These are my own ideas, not my employer's. |
+----------------------------------------------------------------------------+

---End of forwarded mail from "Nicholas R LeRoy" <nick.leroy@norland.com>

-- 
+-------------------------------+--------------------------------------------+
| /`--_   Nicholas R LeRoy      | In a world without fences, Who needs Gates?|
|{     }/ Norland Corporation   |        ---- Experience Linux! ----         |
| \ *  / W6340 Hackbarth Rd     | http://www.linux.org | http://www.ssc.com  |
| |___| Fort Atkinson, WI 53538 +--------------------------------------------+
|      nick.leroy@norland.com   | #include <disclaimer.h>                    |
|http://www3.norland.com/~nleroy| These are my own ideas, not my employer's. |
+----------------------------------------------------------------------------+

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/