Problems with Tulip under heavy load

Oren Laadan (orenl@cs.huji.ac.il)
Wed, 4 Nov 1998 19:53:43 +0200 (IST)


Hi

We're getting frequent network lockups with kernels 2.1.120 -- 2.1.126.
All of a sudden the machine doesn't respond to the network AT ALL, as
if the cable was cut. In fact the machine is totally alive and everything
works inside - scheduling, and all.

Using 'tcpdump' I could see it can send packets (I see many ARP requests,
and during the first 10 minutes after the lockup there are also some IP
packets).
I could see that, something happens at the network level (in addition to
seeing packets going out): for example, I verified that icmp_recv() is
called (by ip_local_deliver()), and that icmp_unreac() is also called.

It looks as if packets only go out, but when they come back in they are
either dropped, corrupted or mistreated.

After rebooting, I could see once message in the log file:
"eth0: Too much work at interrupt, csr5=0xfc230040"

After which more messages appear describing problems with the network
(NFS server not responding etc). This is fairly consistent, and happened
on over 20 different nodes, repeatedly !

Lastly - the hardware was detected at boot as:

Nov 4 12:51:21 mos61 kernel: eth0: Digital DS21140 Tulip at 0x8000, 00 00 c0 43 18 e7, IRQ 19.
Nov 4 12:51:21 mos61 kernel: eth0: Old format EEPROM on 'SMC9332DST' board. Using substitute media control info.
Nov 4 12:51:21 mos61 kernel: eth0: EEPROM default media type Autosense.
Nov 4 12:51:21 mos61 kernel: eth0: Index #0 - Media 10baseT (#0) described by a 21140 non-MII (0) block.
Nov 4 12:51:21 mos61 kernel: eth0: Index #1 - Media 100baseTx (#3) described by a 21140 non-MII (0) block.

The machine is a P-II 300, with TYAN S1680 (Dual board but a single CPU).

Any ideas ?

Oren.

__________________________________________________________________________
______ ____ ___ ___ _ __ \
MOSIX Development Group ) ) ) ) ) ( ' ) \ / Oren Laadan \
The Hebrew University / / / / / \ / / orenl@cs.huji.ac.il \
of Jerusalem, Israel ( ( (___( ___) _(_ __/ \_______________________)

http://www.mosix.cs.huji.ac.il

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/