Re: VM: killing... (and Oops!)

From: Greg Baker (greg.baker@amd.com)
Date: Fri Apr 21 2000 - 16:17:32 EST


I'd like to thank everyone for their comments and offer current status:

It seems according to memtest (http://reality.sgi.com/
cbrady_denver/memtest86/) I do have some bad memory. I had originally
thought of this, and used the VA Linux burn-in software
(ftp://ftp.varesearch.com/pub/software/Cerberus) to test (which
returned no error results). I guess don't trust www.crucial.com for
memory. Should have stuck to mushkin.

Unfortunately I'll be in Washington DC for a week starting today, so
won't have a chance to get my hands-on for further debugging until
then.

Thanks for pointing out memtest. Hopefully I can isolate at least 1
good chip so I can do my benchmarks/testing. If anybody else has done
large (~ process > 150MB, 4 hour+ run-time) Mentor Calibre jobs on
Linux, please let me know about success, failures, and caveats.

Thanks,

--Greg

FYI, during the heavy load I introduced today I got 4 Oops too!

OOPS 1:
pr 21 12:26:48 case kernel: Unable to handle kernel paging request at
virtual address 00ddff28
Apr 21 12:26:48 case kernel: current->tss.cr3 = 2bddb000, %cr3 =
2bddb000
Apr 21 12:26:48 case kernel: *pde = 00000000
Apr 21 12:26:48 case kernel: Oops: 0000
Apr 21 12:26:48 case kernel: CPU: 0
Apr 21 12:26:48 case kernel: EIP: 0010:[del_timer+10/59]
Apr 21 12:26:48 case kernel: EFLAGS: 00010046
Apr 21 12:26:48 case kernel: eax: 2bddb000 ebx: 00000246
ecx: 00ddff24 edx: ecd249c0
Apr 21 12:26:48 case kernel: esi: 000250bf edi: 00000007
ebp: ebddff0c esp: ebddff10
Apr 21 12:26:48 case kernel: ds: 0018 es: 0018 ss: 0018
Apr 21 12:26:48 case kernel: Process sshd (pid: 977, process nr: 165,
stackpage=ebddf000)
Apr 21 12:26:48 case kernel: Stack: c0111294 00ddff24 ebddff24
00000000 00000040 00000000 00000000 000250bf
Apr 21 12:26:48 case kernel: ebdde000 c0110f04 00000000
c012e480 00000004 00000026 00000007 ed4fefa8
Apr 21 12:26:48 case kernel: 00000104 00000007 ebdde000
00000001 00000000 d5b73000 c012e927 00000007
Apr 21 12:26:48 case kernel: Call Trace: [schedule_timeout+108/134]
[process_timeout+0/15] [do_select+154/529] [sys_select+816/1134]
[system_call+52/56]
Apr 21 12:26:48 case kernel: Code: 8b 51 04 85 d2 74 12 8b 01 89 02 85
c0 74 03 89 50 04 b8 01
Apr 21 12:49:10 case kernel: md: md1: sync done.

OOPS 2:

Apr 21 14:57:32 case kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000350
Apr 21 14:57:32 case kernel: current->tss.cr3 = 00101000, %cr3 =
00101000
Apr 21 14:57:32 case kernel: *pde = 00000000
Apr 21 14:57:32 case kernel: Oops: 0002
Apr 21 14:57:32 case kernel: CPU: 0
Apr 21 14:57:32 case kernel: EIP: 0010:[kmem_cache_free+205/368]
Apr 21 14:57:32 case kernel: EFLAGS: 00010046
Apr 21 14:57:32 case kernel: eax: 00000340 ebx: e4983fd0
ecx: dc008fe0 edx: 00000340
Apr 21 14:57:32 case kernel: esi: efeff740 edi: 00000286
ebp: 00000031 esp: efed9f74
Apr 21 14:57:32 case kernel: ds: 0018 es: 0018 ss: 0018
Apr 21 14:57:32 case kernel: Process kswapd (pid: 5, process nr: 5,
stackpage=efed9000)
Apr 21 14:57:32 case kernel: Stack: dc008f90 c06d1f98 dc008fdc
efed9fac c0129069 efeff740 dc008f90 dc008f90
Apr 21 14:57:32 case kernel: dc008f90 c0129dab dc008f90
dc008f90 c06d1f98 00000bfb 00000030 00000008
Apr 21 14:57:32 case kernel: c011e2b2 c06d1f98 00000010
00000006 c012365a 00000006 00000030 efed8000
Apr 21 14:57:32 case kernel: Call
Trace: [put_unused_buffer_head+33/76] [try_to_free_buffers+71/128]
[shrink_mmap+218/300] [do_try_to_free_pages+42/124] [tvecs+7278/13856]
[kswapd+107/164] [get_options+0/112]
Apr 21 14:57:32 case kernel: [kernel_thread+35/48]
Apr 21 14:57:32 case kernel: Code: 89 48 10 89 0e eb 9c 8d 74 26 00 57
9d 56 53 68 67 5f 1e c0

OOPS 3 & 4 happened crashed the system and I didn't copy the screen
dump down.

On Fri, 21 Apr 2000, bert hubert wrote:

|I took the liberty to forward your message to linux-raid:
|
|----- Forwarded message from "Georg P. Israel" <georg@web0.redwave.net> -----
|
|Date: Fri, 21 Apr 2000 22:00:35 +0200
|From: "Georg P. Israel" <georg@web0.redwave.net>
|To: linux-raid@vger.rutgers.edu, vince@digex.net
|Subject: Re: [gbaker@hendrix.amd.com: Re: VM: killing...]
|
|Vince,
|
|I'm pretty sure you have some bad memory modules in you machine.
|Make a mem test e.g. memtest86
|to be sure that your memory is ok.
|
|
|Georg
|<g.israel@ieee.org>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:19 EST