Re: [PATCH net-next] net: phy: realtek: check validity of 10GbE link-partner advertisement

From: Daniel Golle
Date: Tue Oct 08 2024 - 19:15:33 EST


Hi Russell,

On Tue, Oct 08, 2024 at 03:27:21PM +0100, Russell King (Oracle) wrote:
> Okay, I think the problem is down to the order in which Realtek is
> doing stuff.
> [...]
> Now, rtl822x_read_status() reads the 10G status, modifying
> phydev->lp_advertising before then going on to call
> rtlgen_read_status(), which then calls genphy_read_status(), which
> in turn will then call genphy_read_lpa().
>
> First, this is the wrong way around. Realtek needs to call
> genphy_read_status() so that phydev->link and phydev->autoneg_complete
> are both updated to the current status.

First of all thanks a lot for diving down that rabbit hole with me!

>
> Then, it needs to check whether AN is enabled, and whether autoneg
> has completed and deal with both situations.
>
> Afterwards, it then *possibly* needs to read its speed register and
> decode that to phydev->speed, but I don't see the point of that when
> it's (a) not able to also decode the duplex from that register, and
> (b) when we've already resolved it ourselves from the link mode.
> What I'd be worried about is if the PHY does a down-shift to a
> different speed _and_ duplex from what was resolved - and thus
> whether we should even be enabling downshift on this PHY. Maybe
> there's a bit in 0xa43 0x12 that gives us the duplex as well?
>
> In other words:
>
> static int rtl822x_read_status(struct phy_device *phydev)
> {
> int lpadv, ret;
>
> ret = rtlgen_read_status(phydev);
> if (ret < 0)
> return ret;
>
> if (phydev->autoneg == AUTONEG_DISABLE)
> return 0;
>
> if (!phydev->autoneg_complete) {
> mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0);
> return 0;
> }
>
> lpadv = phy_read_paged(phydev, 0xa5d, 0x13);
> if (lpadv < 0)
> return lpadv;
>
> mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, lpadv);
> phy_resolve_aneg_linkmode(phydev);
>
> return 0;
> }
>
> That should at least get proper behaviour in the link partner
> advertising bitmap rather than the weirdness that Realtek is doing.
> (BTW, other drivers should be audited for the same bug!)

Got it, always do genphy_read_status() first thing, as that will
clear things and set autoneg_complete.

Similarly, when dealing with the same PHY in C45 mode, I noticed that
phy->autoneg_complete never gets set, but rather we have to check it
via genphy_c45_aneg_done(phydev) and clear bits set by
mii_stat1000_mod_linkmode_lpa_t().

Doing so for C45 access, and following your suggestion above for C22
resolves the issue without any need to check MDIO_AN_10GBT_STAT_LOCOK
or MDIO_AN_10GBT_STAT_REMOK.

> [...]
> However, if we keep the rtlgen_decode_speed() stuff, and can fix the
> duplex issue, then the phy_resolve_aneg_linkmode() calls should not
> be necessary, and it should be moved _after_ this to ensure that
> phydev->speed (and phydev->duplex) are correctly set.

PHY Specific Status Register, MMD 31.0xA434 also carries duplex
information in bit 3 as well as more useful information.
Probably rtlgen_decode_speed() should be renamed to rtlgen_decode_physr()
and decode most of that.

I'll post a series taking care of all of that shortly.


Again, thanks a lot for the extremely insightful lesson!


Cheers


Daniel