Re: via-rhine: NETDEV WATCHDOG: eth0: transmit timed out

From: Urban Widmark (urban@svenskatest.se)
Date: Wed Jun 07 2000 - 14:16:18 EST


(patch for testing at the end. it shouldn't make anything worse so please
 test if you are getting these errors.)

On Wed, 7 Jun 2000, Marco Colombo wrote:

> On Tue, 6 Jun 2000, Urban Widmark wrote:
[snip]
> > and try and decode it (the first 2 lines are the current descriptors 0x00
> > - 0x1f is rx, 0x20 - 0x3f is tx, same format as the rx_desc/tx_desc

(This is of course wrong, it should be:
0x20 current rx
0x30 next rx
0x40 current tx
0x50 next tx)

> VIA VT86C100A Rhine 10/100 chip registers at 0xa400
> 0x000: c1ba5000 806c93e8 0000085a 4eff0000 00000000 00000000 075a7000 075a7120
> 0x020: 80000400 00000600 0759d810 075a7010 80000000 00000600 015ff010 075a7020
> 0x040: 80000000 00e085ea 01758c00 075a7130 80000000 00e082c6 01759200 075a7140
          ^^^^^^^^
> 0x060: 063e0878 017591c4 00000000 00061008 782d0100 00000080 00070000 00000000

80000000 means that the "owner" bit is set and means that this descriptor
is owned by the card (ie that a transmit has been started). The next
descriptor also has this set, so unless you were sending a lot this is
probably the one with problem.

If all of them were "unsent" because of collisions there should be a
interrupt status bit set, but it isn't. Hmm, in your report from using 2.2
you wrote that you got:
  eth0: Something Wicked happened! 001a.
  last message repeated 2 times
is that just before it stops working?

001a is: transmit buffer underflow, packet transmission aborted because of
         excessive collision, packet transmitted with no errors.
or IntrTxDone | IntrTxAbort | IntrTxUnderrun.

With debug > 1 you should get "Transmitter underrun" messages too. Do you?

> Output "B" is the same, but for registers:
>
> 0x000: c1ba5000 206c93e8 0000081a 4eff0000 00000000 00000000 075a7000 075a7100
> 0x020: 80000400 00000600 0641f810 075a7010 80000000 00000600 0641f010 075a7020
> 0x040: 00000000 00e08000 badf00d0 075a7140 00000000 00e08000 badf00d0 075a7140
                            ^^^^^^^^
> 0x060: 0778b16c 01758e4e 00000000 00061008 782d0100 00000080 00070000 00000000

I don't know who has written badf00d0 here as buffer pointer ... (the
driver writes to the rx ring on netdev_close/via_rhine_close) but they
shouldn't matter since it's not being used. I believe the 'next' looks
like the 'current' when it is idle, so that's ok too.

> I believe "A" is from a stopped state, "B" a working one, but I can't
> tell for sure. More to come when the K7V is up again.

You could also try this, or some variant of this. The "tx_timeout" have a
few "to do's".

--- linux-2.4.0-test1/drivers/net/via-rhine.c Sat May 27 12:20:05 2000
+++ linux/drivers/net/via-rhine.c Wed Jun 7 21:01:26 2000
@@ -816,12 +816,20 @@
 
         /* XXX Perhaps we should reinitialize the hardware here. */
         dev->if_port = 0;
+ writew(CmdReset, ioaddr + ChipCmd);
+
+ np->chip_cmd = CmdStart|CmdTxOn|CmdRxOn|CmdNoTxPoll;
+ if (np->duplex_lock)
+ np->chip_cmd |= CmdFDuplex;
+ writew(np->chip_cmd, ioaddr + ChipCmd);
+
 
         /* Stop and restart the chip's Tx processes . */
         /* XXX to do */
 
         /* Trigger an immediate transmit demand. */
- /* XXX to do */
+ writew(CmdTxDemand | np->chip_cmd, dev->base_addr + ChipCmd);
+
 
         dev->trans_start = jiffies;
         np->stats.tx_errors++;

If this fails you could try adding other bits of code that is being done
at startup ... the tx_timout is almost the same in the 2.2 driver so you
could test it there too (but it won't apply cleanly I think).

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:29 EST