Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

From: Vladimir Oltean
Date: Mon Jun 01 2020 - 16:57:47 EST

On Mon, 1 Jun 2020 at 03:28, Russell King - ARM Linux admin
<linux@xxxxxxxxxxxxxxx> wrote:
> On Mon, Jun 01, 2020 at 12:00:16AM +0300, Vladimir Oltean wrote:

> > This is all relevant because our options for the stable trees boil
> > down to 2 choices:
> > - Revert f62265b53ef34a372b657c99e23d32e95b464316, fix an API misuse
> > and a bug, but lose an (admittedly ad-hoc, but still) useful way of
> > troubleshooting a system misconfiguration (hide the problem that Zefir
> > Kurtisi was seeing).
> Or maybe just allow at803x_aneg_done() to return non-zero but still
> print the warning (preferably identifying the affected PHY) so
> your hard-to-debug problem still gets a useful kernel message pointing
> out what the problem is?


> > - Apply this patch which make the PHY state machine work even with
> > this bent interpretation of the API. It's not as if all phylib users
> > could migrate to phylink in stable trees, and even phylink doesn't
> > catch all possible configuration cases currently.
> I wasn't even proposing that as a solution.
> And yes, I do have some copper SFP modules that have an (inaccessible)
> AR803x PHY on them - Microtik S-RJ01 to be exact. I forget exactly
> which variant it is, and no, I haven't seen any of this "SGMII fails
> to come up" - in fact, the in-band SGMII status is the only way to
> know what the PHY negotiated with its link partner... and this SFP
> module works with phylink with no issues.

See, you should also specify what kernel you're testing on. Since
Heiner did the PHY_AN cleanup, phy_aneg_done is basically dead code
from the state machine's perspective, only a few random drivers call
So I would not be at all surprised that you're not hitting it simply
because at803x_aneg_done is never in your call path.

> --
> RMK's Patch system:
> FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up