PROBLEM: 2.2.5 smp kernel locks up when transferring large files

Brimhall, GeoffreyX L (geoffreyx.l.brimhall@intel.com)
Wed, 14 Apr 1999 11:19:49 -0700


I'm consistently running into a bug with the 2.2.5 smp kernel where the
system locks up when transferring large files (~10 Megs or larger) via ftp
on a local lan. I can boot my system with a 2.1.125 smp kernel and can't
reproduce the bug at all. Also, I've used a couple different ftp client
programs (regular ftp, and lftp) and still get the same results. There is a
strange quirckiness in the hardware setup needed to reproduce the bug that
I'll detail below.

When the system locks up, after a few minutes of waiting the following
appears on the main system terminal (the system is still locked up though
after this):

irq: 2 [0 2]
bh: 1 [1 0]

<[c010a8e9]> <[c010a82a]> <[c010aafb]> <[c0154e19]> <[c0150275]>
<[c015da85]> <[c015dd9a]>

Here's the hardware setup:
1. Dual P-Pro 180's (256k cache), with 128 Megs EDO memory.
2. Super Micro motherboard, model P6DNF.
3. Adaptec 2940UW pci card, with a 4gig scsi UW harddrive, scsi cdrom.
4. 3com 3c509b ISA PnP ethernet card (though the PnP has been disabled).
5. Soundblaster Awe32 (isa)
6. Matrox Millinium II video card.

Hardware Quirckiness:
The local LAN consists of the the server machine and a laptop connected to
the same T-base 10 router (it's a generic cheap-ass 8-port router). The
cable length between the two is large - the server's cable is 50 feet, the
laptop is 40 feet.

What is strange, and why I'm mentioning the cable length, is that when I had
a third machine running on the lan (it was running windows 98), the bug
would not show up. The cable length to this third machine is 10 feet. Don't
know if cable length could be an issue.

All machines have static IPs (they're using the 192.168.1.x allocations).

If no one is able to repro the bug, and it's not to hard to explain how to
debug the kernel, let me know and I'll try my hand at debugging (though up
till now I've always dealt with userland development stuff).

Also, I'm not subscribed to linux-kernel@vger.rutgers.edu, so if you could
cc or reply me at brimhall@pobox.com I'd appreciate it.

Here's more info (using REPORTING-BUGS template). It is gleamed *after*
rebooting the locked system.

4) Linux version 2.2.5 (root@Waterfall) (gcc version 2.7.2.3) #1 SMP Tue Apr
13 21:14:30 PDT 1999

7.1)
Linux Waterfall 2.2.5 #1 SMP Tue Apr 13 21:14:30 PDT 1999 i686 unknown
Kernel modules 2.2.5
Gnu C 2.7.2.3
Binutils 2.9.1.0.19
Linux C Library 2.0.7
Dynamic linker ldd: version 1.9.10
Procps 1.2.9
Mount 2.9g
Net-tools (1998-03-02)
Kbd 0.96
Sh-utils 1.16
Modules Loaded parport_probe lp parport vfat fat sg 3c509

7.2)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 1
model name : Pentium Pro
stepping : 7
cpu MHz : 179.633252
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
bogomips : 179.00

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 1
model name : Pentium Pro
stepping : 7
cpu MHz : 179.633252
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
bogomips : 179.40

7.3)
parport_probe 2848 0 (autoclean) (unused)
lp 4800 0 (autoclean) (unused)
parport 7088 0 (autoclean) [parport_probe lp]
vfat 11456 1
fat 24424 1 [vfat]
sg 3752 0 (unused)
3c509 5600 1

7.4)
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: QUANTUM Model: XP34550W Rev: LXY4
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: iomega Model: jaz 1GB Rev: J^77
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 06 Lun: 00
Vendor: TEAC Model: CD-ROM CD-516S Rev: 1.0D
Type: CD-ROM ANSI SCSI revision: 02

7.5)
/proc/interrupts:
CPU0 CPU1
0: 43794 0 XT-PIC timer
1: 0 2 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
3: 22 6 IO-APIC-edge serial
8: 1 1 IO-APIC-edge rtc
9: 3990 4465 IO-APIC-edge eth0
13: 1 0 XT-PIC fpu
14: 3 4 IO-APIC-edge ide0
18: 1511 1507 IO-APIC-level aic7xxx
NMI: 0
ERR: 0

/proc/locks:
: FLOCK ADVISORY WRITE 0 08:03:370703 0 2147483647 c0009da0 00000000
c0009e00 00000000 00000000
2: POSIX ADVISORY WRITE 255 08:03:370702 0 2147483647 c0009e00 c0009da0
c0009f20 00000000 00000000
3: POSIX ADVISORY WRITE 228 08:03:180435 0 2147483647 c0009f20 c0009e00
00000000 00000000 00000000

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/