SMP Instability

Martin Imrisek (modulus@orpheus)
Wed, 5 Nov 1997 22:23:43 -0500 (EST)


I just sent an incomplete message to the list. I meant to cancel it, but
I sent it instead. Apologies.

I might as well continue it here. I've recently begun to use the SMP
kernels 2.0.31 and 2.1.62 on my Tyan Tomcat IV (dual P75). I've been
experiencing some rampant instability from spontaneous reboots, deadlocks
and halts with garbage on the screen. I've eliminated the reboots, which
appeared to be caused by hardware misconfiguration.

Currently I am running 2.1.62 without any patches and I am experiencing
the following:

Nov 5 22:04:15 orpheus kernel: d_alloc: 3650 unused, pruning dcache
Nov 5 22:04:15 orpheus kernel: d_alloc: 3650 unused, pruning dcache
Nov 5 22:04:15 orpheus kernel: d_alloc: 3649 unused, pruning dcache

these messages keep on appearing, though they only appear when compiling
glibc. During this time the compile slows to a crawl, eventually freezing
the machine. Kernel compiles work flawlessly?!

Running several high CPU usage programs seems to result in a deadlock.
More often than not, running an OpenGL xlock results in a frozen machine
after some time. (like overnight).
Deadlocks also seem to happen when running several gimp filters on large
images, though this does not happen often. Or, compiling the 2.1.62 kernel
with 'make -j'. On several occassions I've ended up with garbage
>000:0000 repeated all over the screen. This seems to be a phenomenon
with the 2.1.xx kernels only. I've never seen it under 2.0.31.

Also, recently I've been getting these kinds of messages once in a while:

Nov 5 21:58:35 orpheus kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000199
Nov 5 21:58:35 orpheus kernel: current->tss.cr3 = 01aa8000, (r3 =01aa8000
Nov 5 21:58:35 orpheus kernel: *pde = 00000000
Nov 5 21:58:35 orpheus kernel: Oops: 0000
Nov 5 21:58:35 orpheus kernel: CPU: 0
Nov 5 21:58:35 orpheus kernel: EIP: 0010:[<c01402af>]
Nov 5 21:58:35 orpheus kernel: EFLAGS: 00010292
Nov 5 21:58:35 orpheus kernel: eax: c1f40000 ebx: 00000125 ecx:
c2720150 edx: c2720150
Nov 5 21:58:35 orpheus kernel: esi: 000000fe edi: c021a310 ebp:
00000000 esp: c1f41e70
Nov 5 21:58:35 orpheus kernel: ds: 0018 es: 0018 ss: 0018
Nov 5 21:58:35 orpheus kernel: Process pine (pid: 20942, process nr: 68,
stackpage=c1f41000)
Nov 5 21:58:35 orpheus kernel: Stack: 00000000 c0140030 c0f7d1e0 00000003
c01118c9 c1f41eac 00000000 00000000
Nov 5 21:58:35 orpheus kernel: c021a310 000001b1 000000fe 0003f800
c0203d58 c2720120 ffffffe4 0000013d
Nov 5 21:58:35 orpheus kernel: c2ed4f80 c0f7d1e0 c0203d58 000000dc
c0129f7f c2ed4f80 00000400 c0203d58
Nov 5 21:58:35 orpheus kernel: Call Trace: [<c0140030>] [<c01118c9>]
[<c0129f7f>] [<c014054e>] [<c0127f2e>] [<c0140030>] [<c0128152>]
Nov 5 21:58:35 orpheus kernel: [<c010970a>]
Nov 5 21:58:35 orpheus kernel: Code: 8b 4b 74 25 8b 51 18 f6 c2 01 75 25
85 c9 74 09 51 e
8 97 9c

This has also happened when doing a 'cat /dev/fd0 |less' to look at some
raw data. Under 2.0.3x this would at worst mess up my terminal.

Sound on 2.0.31 either works, or causes spontaneous deadlocks (MAD 16
card).
On 2.1.62 sound works but has periodic 'clicking' static like sounds when
playing CDs.

Has anyone reported problems like this?

--------------------------------------------------------------
Martin Imrisek "I've done . . . questionable things.
imrisek@interlog.com Nothing the God of biomechanics
wouldn't let you into heaven for."