PROBLEM: Sustained network traffic dies after a few seconds

Hans (dehartog@telekabel.nl)
Mon, 28 Jun 1999 02:45:22 +0200


This is a multi-part message in MIME format.
--------------124BE441F195E83F9F935333
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


--------------124BE441F195E83F9F935333
Content-Type: text/plain; charset=us-ascii;
name="bug_report"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="bug_report"

Dear kernel-hackers,

Here's a problem I struggled with for the last three weeks.
What follows complies with the format as described in
/usr/src/linux/REPORTING-BUGS

1. Sustained network traffic dies after a few seconds

2. When a transfer is started over the (ether)network, it stops after a few
seconds. Communications between the two hosts can only re-established
by doing "ifconfig ... down" and "ifconfig ... up" (on BOTH sides) or
rebooting (in case of a VXT2000). No error-messages. ifconfig simply
reports some collisions/errors/drops.
This is very reproducible by doing rcp, ftp or tftp a file that is larger
than 1Mb or so. An even quicker way to reproduce it, is a "ping -f ...".
This happens between 2 linux-PC's but also on a linux-PC that pumps an
image (1.2Mb) to an X-terminal (VXT2000) with tftp.
This problem appeared when I upgraded from RedHat 5.2 (kernel 2.0.36) to
SuSE 6.1 (kernel 2.2.5). I also tried the SuSE 6.1 upgrade (kernel 2.2.7)
and after somebody told me the 2.2.7 is buggy, I tried 2.2.9 (straight
from www.linux.org). Still the same problem. Then I perused THOUSANDS
of articles in comp.os.linux.networking. Two cases looked similar to this
problem and the solution was: upgrade to 2.2.10. Well, 2.2.10 still had
this problem.
I checked my cabling and everything that might be causing this and can
not find a solution to this problem. Note that when I boot my old 2.0.36
kernel, THE PROBLEM DOES NOT OCCUR (which tells me that it is no hardware
problem).
Other things I tried, but did not solve the problem:
- switching off masquerading
- not loading the vmmon and vmnet modules from vmware
- not enabling my second network card (to the internet via a cable modem)
- disable ip-accounting (with ipchains)
- switching off "persistent DMA" in the sound configuration

Becoming desparate by now, I've taken the liberty to post this to you.

3. Keyword(s): Networking

4. cat /proc/version

Linux version 2.2.10 (root@n320) (gcc version egcs-2.91.66 19990314
(egcs-1.1.2 release)) #1 Sun Jun 27 16:29:56 MEST 1999

5. oops (not applicable)

6. how to reproduce: ping -f <any host on this thinwire ethernet> and the
screen fills with dots after a few seconds

7. Environment

7.1 Output of /usr/src/linx/scripts/ver_linux:

-- Versions installed: (if some fields are empty or looks
-- unusual then possibly you have very old versions)
Linux n320 2.2.10 #1 Sun Jun 27 16:29:56 MEST 1999 i686 unknown
Kernel modules 2.2.2-pre6
Gnu C egcs-2.91.66
Binutils 2.9.1.0.22
Linux C Library x 1 root root 2475225 Apr 15 01:57 /lib/libc.so.6
Dynamic linker ldd (GNU libc) 2.0.7
Procps 1.2.11
Mount 2.9i
Net-tools 1.50
Kbd 0.99
Sh-utils 1.12
Modules Loaded ip_masq_user

7.2 cat /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 350.803505
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx osfxsr
bogomips : 349.80

7.3 cat /proc/modules (almost everything, including sound, scsi and networking
is compiled directly into the kernel and not as modules)

ip_masq_user 2312 0 (autoclean)

7.4 cat /proc/scsi/scsi

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: DDRS-39130D Rev: DC1B
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: DEC Model: RZ26L (C) DEC Rev: 440C
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: IBM Model: DDRS-39130D Rev: DC1B
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: PLEXTOR Model: CD-R PX-R412C Rev: 1.05
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: PLEXTOR Model: CD-ROM PX-32TS Rev: 1.03
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 06 Lun: 00
Vendor: DEC Model: TLZ06 (C)DEC Rev: 4BQE
Type: Sequential-Access ANSI SCSI revision: 02

7.5 Whatever might be relevant. Well, I guess /var/log/boot.msg contains
most info I did not include sofar, so here it is:

Inspecting /boot/System.map
Loaded 7677 symbols from /boot/System.map.
Symbols match kernel version 2.2.10.
No module symbols loaded.
klogd 1.3-3, log source = /proc/kmsg started.
<4>Linux version 2.2.10 (root@n320) (gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release)) #1 Sun Jun 27 16:29:56 MEST 1999
<4>Detected 350803505 Hz processor.
<4>Console: colour VGA+ 80x25
<4>Calibrating delay loop... 349.80 BogoMIPS
<4>Memory: 127752k/131008k available (1072k kernel code, 412k reserved, 1708k data, 64k init)
<4>CPU: Intel Pentium II (Deschutes) stepping 02
<6>Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
<6>Checking 'hlt' instruction... OK.
<4>POSIX conformance testing by UNIFIX
<4>mtrr: v1.35 (19990512) Richard Gooch (rgooch@atnf.csiro.au)
<4>PCI: PCI BIOS revision 2.10 entry at 0xf0720
<4>PCI: Using configuration type 1
<4>PCI: Probing PCI hardware
<6>Linux NET4.0 for Linux 2.2
<6>Based upon Swansea University Computer Society NET3.039
<6>NET4: Unix domain sockets 1.0 for Linux NET4.0.
<6>NET4: Linux TCP/IP 1.0 for NET4.0
<6>IP Protocols: ICMP, UDP, TCP
<4>Starting kswapd v 1.5
<6>parport0: PC-style at 0x378 [SPP,ECP,ECPEPP,ECPPS2]
<4>parport0: detected irq 7; use procfs to enable interrupt-driven operation.
<6>Detected PS/2 Mouse Port.
<6>Serial driver version 4.27 with no serial options enabled
<6>ttyS00 at 0x03f8 (irq = 4) is a 16550A
<6>ttyS01 at 0x02f8 (irq = 3) is a 16550A
<4>pty: 256 Unix98 ptys configured
<6>lp0: using parport0 (polling).
<6>Real Time Clock Driver v1.09
<7>Sound initialization started
<4><Sound Blaster 16 (4.12)> at 0x220 irq 5 dma 1,5
<4><Sound Blaster 16> at 0x330 irq 5 dma 0
<4><Yamaha OPL3> at 0x388
<4><SoundBlaster EMU8000 (RAM2048k)>
<7>Sound initialization complete
<4>RAM disk driver initialized: 16 RAM disks of 4096K size
<4>PIIX4: IDE controller on PCI bus 00 dev 21
<4>PIIX4: not 100% native mode: will probe irqs later
<4> ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:pio
<4>hda: ATAPI CD-ROM DRIVE 36X MAXIMUM, ATAPI CDROM drive
<4>hdb: IOMEGA ZIP 100 ATAPI, ATAPI FLOPPY drive
<4>ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
<4>hda: ATAPI 32X CD-ROM drive, 128kB Cache
<6>Uniform CDROM driver Revision: 2.55
<6>hdb: 98304kB, 196608 blocks, 512 sector size
<6>hdb: 98304kB, 96/64/32 CHS, 4096 kBps, 512 sector size, 2941 rpm
<6>Floppy drive(s): fd0 is 1.44M
<6>FDC 0 is a post-1991 82077
<6>ncr53c8xx: at PCI bus 0, device 12, function 0
<6>ncr53c8xx: 53c875J detected with Symbios NVRAM
<6>ncr53c875J-0: rev=0x04, base=0xe1000000, io_port=0xb400, irq=9
<6>ncr53c875J-0: Symbios format NVRAM, ID 7, Fast-20, Parity Checking
<6>ncr53c875J-0: initial SCNTL3/DMODE/DCNTL/CTEST3/4/5 = (hex) 05/4e/80/01/00/24
<6>ncr53c875J-0: final SCNTL3/DMODE/DCNTL/CTEST3/4/5 = (hex) 05/46/a0/00/08/24
<6>ncr53c875J-0: on-chip RAM at 0xe0800000
<4>ncr53c875J-0: resetting, command processing suspended for 2 seconds
<6>ncr53c875J-0: restart (scsi reset).
<4>ncr53c875J-0: enabling clock multiplier
<4>ncr53c875J-0: Downloading SCSI SCRIPTS.
<4>scsi0 : ncr53c8xx - version 3.2
<4>scsi : 1 host.
<4>ncr53c875J-0: command processing resumed
<4> Vendor: IBM Model: DDRS-39130D Rev: DC1B
<4> Type: Direct-Access ANSI SCSI revision: 02
<4>Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
<4> Vendor: DEC Model: RZ26L (C) DEC Rev: 440C
<4> Type: Direct-Access ANSI SCSI revision: 02
<4>Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
<4> Vendor: IBM Model: DDRS-39130D Rev: DC1B
<4> Type: Direct-Access ANSI SCSI revision: 02
<4>Detected scsi disk sdc at scsi0, channel 0, id 2, lun 0
<4> Vendor: PLEXTOR Model: CD-R PX-R412C Rev: 1.05
<4> Type: CD-ROM ANSI SCSI revision: 02
<4>Detected scsi CD-ROM sr0 at scsi0, channel 0, id 3, lun 0
<4> Vendor: PLEXTOR Model: CD-ROM PX-32TS Rev: 1.03
<4> Type: CD-ROM ANSI SCSI revision: 02
<4>Detected scsi CD-ROM sr1 at scsi0, channel 0, id 4, lun 0
<4> Vendor: DEC Model: TLZ06 (C)DEC Rev: 4BQE
<4> Type: Sequential-Access ANSI SCSI revision: 02
<4>Detected scsi tape st0 at scsi0, channel 0, id 6, lun 0
Kernel logging (proc) stopped.
Kernel log daemon terminating.

And the relevant stuff from /var/log/messages (the networkcards):

Jun 27 16:39:32 n320 kernel: ne2k-pci.c:v0.99L 2/7/98 D. Becker/P. Gortmaker http://cesdis.gsfc.nasa.gov/linux/drivers/ne2k-pci.html
Jun 27 16:39:32 n320 kernel: ne2k-pci.c: PCI NE2000 clone 'RealTek RTL-8029' at I/O 0xd000, IRQ 11.
Jun 27 16:39:32 n320 kernel: eth0: PCI NE2000 found at 0xd000, IRQ 11, 00:00:E8:55:39:1C.
Jun 27 16:39:32 n320 kernel: ne2k-pci.c: PCI NE2000 clone 'RealTek RTL-8029' at I/O 0xb800, IRQ 10.
Jun 27 16:39:32 n320 kernel: eth1: PCI NE2000 found at 0xb800, IRQ 10, 00:00:E8:55:2E:02.
Jun 27 16:39:32 n320 kernel: Partition check:
Jun 27 16:39:32 n320 kernel: sda: sda1 sda2 sda3 sda4
Jun 27 16:39:32 n320 kernel: sdb: sdb1 sdb2
Jun 27 16:39:32 n320 kernel: sdc: sdc1 sdc2 sdc3 sdc4
Jun 27 16:39:32 n320 kernel: hdb: hdb4
Jun 27 16:39:32 n320 kernel: VFS: Mounted root (ext2 filesystem) readonly.
Jun 27 16:39:32 n320 kernel: Freeing unused kernel memory: 64k freed
Jun 27 16:39:32 n320 kernel: Adding Swap: 125996k swap-space (priority -1)
Jun 27 16:39:32 n320 named[120]: starting
Jun 27 16:39:32 n320 named[120]: cache zone "" loaded (serial 0)
Jun 27 16:39:32 n320 named[120]: primary zone "0.0.127.in-addr.arpa" loaded (serial 1997022700)
Jun 27 16:39:32 n320 named[123]: Ready to answer queries.
Jun 27 16:39:33 n320 lpd[148]: restarted
Jun 27 16:39:36 n320 sshd2[160]: Listener created on port 22.
Jun 27 16:39:36 n320 sshd2[165]: Daemon is running.
Jun 27 16:39:36 n320 ntpdate[164]: step time server 193.78.241.12 offset -0.017820 sec
Jun 27 16:39:36 n320 xntpd[169]: ntpd 4.0.92c Thu Apr 15 05:46:04 GMT 1999 (1)
Jun 27 16:39:36 n320 xntpd[169]: precision = 14 usec
Jun 27 16:39:36 n320 xntpd[169]: using kernel phase-lock loop 0041
Jun 27 16:39:36 n320 xntpd[169]: frequency initialized -4.306 from /etc/ntp.drift
Jun 27 16:39:36 n320 xntpd[169]: using kernel phase-lock loop 0041

Wel, that's it. I would be very grateful if you can help me. I invested quite
some time in my transition from 2.0.36 to 2.2.* and I'm willing to help you
to solve this problem by providing any information you need.

I'm looking forward to your reaction and appreciate any time and effort you
will spend on this problem,

Yours sincerely,
Hans de Hartog.

--------------124BE441F195E83F9F935333--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/