Kernel oops - help interpreting partial trace info

From: Samson Yeung
Date: Thu Dec 30 2010 - 11:35:37 EST


Hello,

(Sent to the output of ./scripts/get_maintainer.pl -f
kernel/sched_fair.c - please CC me as I am not on-list.)


Some of our customer's systems running a 2.6.32.11 kernel.org kernel
crash, leaving the attached partial oops on screen. I doubt it is a
hardware problem since we've already replaced one system and the
problem came back.

I run the values after "Code:" from the bottom of the oops through
linux-2.6/scripts/decode/code as described in
Documentation/oops-tracing.txt, and this is the first few lines of
output:

Code: 00 44 8b 15 23 76 68 00 45 85 d2 74 32 48 8b 50 08 8b 5a 18 48
8b 90 90 06 00 00 48 8b 4a 50 48 85 c9 74 1b 48 63 db 48 8b 51 20 <48>
03 14 dd 00 fb 74 81 4c 01 32 48 8b 49 58 48 85 c9 75 e8 48
All code
========
0: 00 44 8b 15 add %al,0x15(%rbx,%rcx,4)
4: 23 76 68 and 0x68(%rsi),%esi
7: 00 45 85 add %al,-0x7b(%rbp)
a: d2 (bad)
...

Should (bad) ever be output by decodecode?

I dump the region that RIP indicates using gdb against the mactching
kernel and get different values.

What could cause this?

I'm working on getting a full dump via the serial port, but the
customers are on different continents than I.

Is there any more information I can get from the screenshot other than
probable stack corruption?


-Samson

Attachment: dump2.jpg
Description: JPEG image