Kernel Panic: 2.2.16smp, aic7xxx problem?

From: Brian Marsden (marsden@scripps.edu)
Date: Mon Jul 17 2000 - 12:59:57 EST


This is my first posting to the kernel mailing list. If I have missed
anything out of importance, please let me know.

[1.]
Production filestore system (Intel, smp) produces kernel panic

[2.]
Filestore, dual 600MHz Pentium III. Output of lspci:

00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:0d.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100 (rev 08)
00:0e.0 SCSI storage controller: Adaptec AHA-2940U2/W / 7890
00:0f.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
00:12.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (rev 05)
00:12.1 SCSI storage controller: Intel Corporation 80960RP [i960RP Microprocessor] (rev 05)
01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC AGP (rev 7a)

Connected to AMI Megaraid controller and 100Gb Raid system.

Being used as an NFS filestore serving SGIs and Linux boxes. Also has a
Sony AIT SDX-500C AIT tape drive which is controlled by Sony-supplied
kernel module (see [7.3]). Output of /proc/scsi/scsi is:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM Model: DNES-309170W Rev: SAH0
  Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: IBM Model: DNES-309170W Rev: SAH0
  Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: SONY Model: SDX-500C Rev: 0107
  Type: Sequential-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 03 Id: 00 Lun: 00
  Vendor: MegaRAID Model: LD0 RAID5 05006R Rev: GH8E
  Type: Direct-Access ANSI SCSI revision: 02

Upon heavy use of RAID and/or tape drive (but not clear which), kernel
panic occurs.

[3.]
kernel, aic7xxx, RAID, tape

[4.]
Linux version 2.2.16-3smp (root@hestia) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 SMP Wed Jun 28 06:07:12 PDT 2000

[5.]
The Ooops was dumped to screen, so this is the copied version (output from
ksymoops follows after this):

Unable to handle kernel NULL pointer derefernce at virtual address 00000000
current -> tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Ooops: 0000
CPU: 0
EIP: 0010:[<c01f961f>]
EFLAGS: 00010046
eax: 00000000 ebx: f7f7c024 ecx: 00000000 edx: 00000000
esi: f7fc30dc edi: f7fc5078 ebp: f7fc50cc esp: c02abf28
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage = c02ab000)
Stack: c02abfa8 00000000a f7fc50a8 00000000 c0201587 f7fc5-78 f7fc5078 f7ff89e0
04000001 00000000 0000000a c010c122 0000000a f7fc5078 c02abfa8 00000140
f7ff89e0 0000000a c02abfa0 c0110f4a 0000000a c02abfa8 f7ff89e0 c02abfa8
Call Trace: [<c0201587>] [<c010c122>] [<c0110f4a>] [<c010c293>] [<c010b298>] [<c0108a4a>] [<c0106000>] [<c0106000>] [<c01001ae>]
Code: 8a 48 40 8a 40 42 25 ff 00 00 00 c1 e0 03 09 c1 80 7b 14 00

Output from ksymoops upon entry of above:

>>EIP: c01f961f <aic7xxx_run_waiting_queues+37/2b8>
Trace: c0201587 <do_aic7xxx_isr+73/94>
Trace: c010c122 <handle_IRQ_event+5a/8c>
Trace: c0110f4a <do_level_ioapic_IRQ+62/a0>
Trace: c010c293 <do_IRQ+3b/5c>
Trace: ffffffff <END_OF_CODE+6dc0ac3/????>
Trace: c0108a4a <cpu_idle+3a/50>
Trace: ffffffff <END_OF_CODE+6dc0ac3/????>
Trace: c0106000 <get_options+0/74>
Trace: c01001ae <L6+0/2>
Trace: c0201587 <do_aic7xxx_isr+73/94>
Trace: c010c122 <handle_IRQ_event+5a/8c>
Trace: c0110f4a <do_level_ioapic_IRQ+62/a0>
Trace: c010c293 <do_IRQ+3b/5c>
Trace: c010b298 <common_interrupt+18/20>
Trace: c0108a4a <cpu_idle+3a/50>
Trace: c0106000 <get_options+0/74>
Trace: c0106000 <get_options+0/74>
Trace: c01001ae <L6+0/2>
Code: c01f961f <aic7xxx_run_waiting_queues+37/2b8> 00000000 <_EIP>: <===
Code: c01f961f <aic7xxx_run_waiting_queues+37/2b8> 0: 8a 48 40 mov 0x40(%eax),%cl <===
Code: c01f9622 <aic7xxx_run_waiting_queues+3a/2b8> 3: 8a 40 42 mov 0x42(%eax),%al
Code: c01f9625 <aic7xxx_run_waiting_queues+3d/2b8> 6: 25 ff 00 00 00 and $0xff,%eax
Code: c01f962a <aic7xxx_run_waiting_queues+42/2b8> b: c1 e0 03 shl $0x3,%eax
Code: c01f962d <aic7xxx_run_waiting_queues+45/2b8> e: 09 c1 or %eax,%ecx
Code: c01f962f <aic7xxx_run_waiting_queues+47/2b8> 10: 80 7b 14 00 cmpb $0x0,0x14(%ebx)
 
[6.]
No shell script or anything available to make this happen. All I can think
of is that it may be large amounts of tape activity.

[7]
[7.1]
-- Versions installed: (if some fields are empty or looks
-- unusual then possibly you have very old versions)
Linux hestia 2.2.16-3smp #1 SMP Wed Jun 28 06:07:12 PDT 2000 i686 unknown
Kernel modules 2.3.10-pre1
Gnu C egcs-2.91.66
Binutils 2.9.5.0.22
Linux C Library 2.1.3
Dynamic linker ldd (GNU libc) 2.1.3
Procps 2.0.6
Mount 2.10f
Net-tools 1.54
Console-tools 0.3.3
Sh-utils 2.0
Modules Loaded sonytape st

[7.2]
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 3
cpu MHz : 601.381
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr xmm
bogomips : 1199.31

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 3
cpu MHz : 601.381
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 mmx fxsr xmm
bogomips : 1202.59

[7.3]
Note that these modules are generated from the sonytape-1.1-1.i386.rpm
file available from the Sony web site:

sonytape 26300 0
st 25228 0 (unused)

[7.4]
See above

[7.5]
/var/log/messages has been getting alot of:

ISR called reentrantly!

... which, if I've done my research correctly, is something from the
megaraid driver and is known about...

Thanks for your time and patience.

Brian Marsden

-- 
---------------------------------------------------------------------
 Brian Marsden        			  Email: marsden@scripps.edu
 TSRI, San Diego, USA.  Phone: +1 858 784 8698  Fax: +1 858 784 8299
---------------------------------------------------------------------

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:09 EST