CRC errors between mvneta and macb

From: Richard Genoud
Date: Fri Oct 19 2018 - 11:22:40 EST


Hi all,

I've been struggling with a strange behavior between a clearfog-pro
and an at91sam9g35-ek boards.

TL;DR: ethernet frames are received with a CRC error on the clearfog
ETH0, but seem perfectly all right. Add a switch between the 2
boards, and the ethernet frames are accepted.


I've got a clearfog pro and an at91sam9g35-ek, both with kernel
4.19-rc8.
An RJ45 cable is plugged between the clearfog (on the solo port (eth0))
and the g35-ek board (100Mb/s).

They are configured with autoneg and a fixed IP address.

I start the 2 board, and, with the clearfog I ping the g35-ek.
If it succeeds, it will until the g35-ek is rebooted.
If it fails, it also will until the g35-ek is rebooted.

Rebooting the cleafog doesn't change anything.
Resetting the g35-ek PHY (mii-diag -R) doesn't change anything either.

When the ping fails, it's actually because the mvneta returns a CRC
error:
mvneta f1070000.ethernet eth0: bad rx status 0cc10000 (crc error), size=66

And, if I plug the RJ45 cable between the clearfog's matrix and the
g35-ek, everything works well, always.

To ease the debugging, instead of a ping I used:
https://gist.github.com/austinmarton/1922600
from the g35-ek in order to have the same frame every time.
So, I check with the scope the ethernet CRC (on the g35-ek PHY TXD[0-1]
(DM9161A)).
And the CRC is all right.

I also manage to trigger this bug by simply doing:
rmmod macb ; insmod macb.ko on the g35-ek.
Then, frames are accepted, or not.

I checked all PHY/macb register values on the g35-ek, they are the same.

The only thing I could find is related to the TXCLK on the PHY.

When there's a CRC error, the TXCLK has its polarity inverted...
That's a clue !

But this TXCLK (25MHz) is not used on the g35-ek.
Only the REFCLK/XT2 (50MHz) is used to synchronise the PHY and the macb.
So I guess that the TXCLK has a role to play to generate TX+/TX-

And I also guess that when the signal is converted back on the clearfog,
the clock polarity is somehow responsible for the CRC errors.

I was heading to get my scope on the clearfog's PHY to see what it
received, but Marvell's documentation is not as freely available as
Atmel's ones, so I'm quite stuck at this point.

Any idea ?

NB: I also managed to trigger this with an at91sam9g20-ek (but not with
a sama5d2)


Regards,
Richard