Re: Sun GEM PPC32 Bug?

From: Benjamin Herrenschmidt
Date: Sat Feb 05 2011 - 18:39:47 EST


On Sat, 2011-02-05 at 19:35 +0100, R. Herbst wrote:
> > I think we're simply not resetting enough when the RX FIFO overflow
> > happens.
> >
> > Just for fun I checked the OpenBSD GEM driver to see what they do.
> > When an overflow occurs, they bump the statistic, record the current
> > read and write fifo pointer registers, and schedule a watchdog timer
> > for 400ms into the future.
> >
> > If the watchdog timer sees that the RX FIFO overflow bit is still set
> > in the RX status register, and the RX FIFO read and write pointers
> > have not changed, it resets the entire chip.
> >
> > We unconditionally reset the RX MAC when an overflow occurs, that may
> > simply not be enough to unwedge this thing.

Right. It would be quite easy for us to do the same thing. Interestingly
enough, I have never observed this behaviour on any of my machines (a
wide range of 32-bit and 64-bit Apple machines).

Also, Apple's own driver does things differently. In case of overflow
interrupt, it seems to only bump some statistics. However, it has a
timeout if no packets have been received for a while (5 seconds) and the
Rx fifo overflow bit is set. In that case, they restart the receiver
(and the receiver only).

Their sequence for restarting the receiver however is a tad different
than ours (mostly slightly different ordering of things), it's hard to
tell whether that's relevant or not, but some of the things do make
sense, such as they stop the DMA before resetting the MAC and restart it
after re-enabling the MAC.

If I find some time tonight, else tomorrow, I'll whip up a couple of
patches:

- One simpler re-arranging our Rx reset sequence and adding a test for
the overflow bit at the end, printing out the results, etc...

- One that basically always reset the chip on overflow.

>From there we can decide what works and maybe add a bit of a timeout
to the second approach if needed etc... but how often does that overflow
happens in practice ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/