tg3 and or pci-e bug

From: Christensen Tom
Date: Mon Oct 03 2005 - 00:20:07 EST


I have 3 supermicro systems based on the x6dal-tb2 motherboard. It has built in broadcom 5721 gig-e pci-e nics. eth0 on these boxes fails whenever a decent amount of data is pushed across them (decent being ~100Mb). At this point I can say when it fails I get these error messages in /var/log/messages:
Oct 2 19:08:53 office kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 2 19:08:53 office kernel: tg3: eth0: transmit timed out, resetting
Oct 2 19:08:53 office kernel: tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
Oct 2 19:08:53 office kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
Oct 2 19:08:53 office kernel: tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
Oct 2 19:08:53 office kernel: tg3: eth0: Link is down.

I made a cron job to log ifconfig output to a file every minute. This shows that the NIC resets itself at least every couple minutes when data is being passed. The TX/RX stats in ifconfig reset to 0. The above message in /var/log/messages doesn't happen every time the NIC resets like this. I think that the NIC is resetting because of some bug, and sometimes, the reset fails and locks the NIC, creating the above messages. The above only happens once or twice a day, the other nic resets happen as I said every 2-3 minutes. Is there any information that would be helpful in debugging this problem? Let me know what to run and I'll do it. Eth1 never has this problem, I have pushed 5GB+ onto the box over eth1 and it doesn't blink.
Tom


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/