lockups with 2.4.20 (tg3? net/core/dev.c|deliver_to_old_ones)

From: James Bourne (jbourne@mtroyal.ab.ca)
Date: Fri Feb 14 2003 - 15:39:24 EST


Hi,
Since sometime in December two systems we have on site using P4 HT (one
Dell 2650 and one Dell 4600, both dual CPU, both ht/mce capable) have been
locking up without any kernel output and without sysrq keys working (the
keyboard is locked solid). I've dropped the 4600 back to 2.4.19 but the
2650, not yet in production, is still running 2.4.20 to troubleshoot the
problem...

Using nmi_watchdog I've managed to get a stack track and ran ksymoops over
it (attached). Also attached is the .config file used to build the kernel.
The lockup is reproducable, although this is the first time I've managed
to get any feedback from the kernel on the problem. 2.4.19 with the
same patches, but without tg3, does not lockup...

Thanks in advance for any help that can be given.

Here's more information about the system the oops was captured on:

(kernel compiler)
bash# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)

(Additional patches)
(at http://www.hardrock.org/kernel/2.4.20)
linux-2.4.20-mrc-base.patch: big UID quotas
linux-2.4.20-VFS-lock patch: VFS lock patch for ext3 and lvm
linux-2.4.20-ext3.patch: Andrew Mortons ext3 patches for 2.4.20
irqbalance-2.4.20-MRC.patch: IRQ load balancing patch for the P4 ServerWorks
        (Ingo Molnar <mingo@redhat.com>) brought forward from 2.4.17

(lspci output)
00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
00:00.1 Host bridge: ServerWorks: Unknown device 0012
00:00.2 Host bridge: ServerWorks: Unknown device 0000
00:04.0 Class ff00: Dell Computer Corporation Embedded Systems Management Device 4
00:04.1 Class ff00: Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di
00:04.2 Class 0c07: Dell Computer Corporation: Unknown device 000d
00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
00:0f.3 ISA bridge: ServerWorks GCHE CSB5 South Bridge
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
03:06.0 Ethernet controller: BROADCOM Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
03:08.0 Ethernet controller: BROADCOM Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15)
04:08.0 PCI bridge: Intel Corp. 80960RP [i960 RP Microprocessor/Bridge] (rev 01)
04:08.1 RAID bus controller: Dell Computer Corporation PowerEdge Expandable RAID Controller 3/Di (rev 01)
05:06.0 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)
05:06.1 SCSI storage controller: Adaptec RAID subsystem HBA (rev 01)

(contents of /proc/cpuinfo)
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) XEON(TM) CPU 1.80GHz
stepping : 4
cpu MHz : 1794.248
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3578.26

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) XEON(TM) CPU 1.80GHz
stepping : 4
cpu MHz : 1794.248
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3578.26

processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) XEON(TM) CPU 1.80GHz
stepping : 4
cpu MHz : 1794.248
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3578.26

processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) XEON(TM) CPU 1.80GHz
stepping : 4
cpu MHz : 1794.248
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3578.26

Regards
James Bourne

-- 
James Bourne, Supervisor Data Centre Operations
Mount Royal College, Calgary, AB, CA
www.mtroyal.ab.ca

****************************************************************************** This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ******************************************************************************

"There are only 10 types of people in this world: those who understand binary and those who don't."




- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:57 EST