I am getting "e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang" using stock 2.6.17.11, 2.6.17.5 or 2.6.17.4 kernels on centos 4.3.
The server is a Tyan GS10 and is connected to a Netgear GS724T Gig switch. I can easily reproduce the problem by trying to do a large ftp transfer to the server. It does not happen if the server is connected to a dummy 100 Mb switch, only when is connected to the Gig switch.
I have also tried the options line below disabling tso, tx and rx in the modprobe.conf without any luck.
options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0
in /var/log/kernel I get the following...
Sep 1 23:53:01 www kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Sep 1 23:53:01 www kernel: Tx Queue <0>
Sep 1 23:53:01 www kernel: TDH <4c4>
Sep 1 23:53:01 www kernel: TDT <4c9>
Sep 1 23:53:01 www kernel: next_to_use <4c9>
Sep 1 23:53:01 www kernel: next_to_clean <4c4>
Sep 1 23:53:01 www kernel: buffer_info[next_to_clean]
Sep 1 23:53:01 www kernel: time_stamp <ffff9c60>
Sep 1 23:53:01 www kernel: next_to_watch <4c4>
Sep 1 23:53:01 www kernel: jiffies <ffff9d96>
Sep 1 23:53:01 www kernel: next_to_watch.status <0>
.
repeats the same as above a few times....
.
Sep 1 23:53:10 www kernel: NETDEV WATCHDOG: eth0: transmit timed out
Sep 1 23:53:13 www kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
then the server locks up, no response from the keyboard at all and must be forced down with a power kill.
Here is my driver info,
driver: e1000
version: 7.0.33-k2-NAPI
firmware-version: N/A
bus-info: 0000:02:01.0
What else could I check?