Re: [RFC 04/11] net: phy: Handle c22 regs presence better

From: Russell King - ARM Linux admin
Date: Mon May 25 2020 - 19:33:45 EST


On Mon, May 25, 2020 at 06:22:19PM -0500, Jeremy Linton wrote:
> On 5/25/20 6:09 PM, Russell King - ARM Linux admin wrote:
> > On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
> > > On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> > > > On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
> > > > > Hi,
> > > > >
> > > > > On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
> > > > > > On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > > > > > > > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > > > > > > > Until this point, we have been sanitizing the c22
> > > > > > > > > regs presence bit out of all the MMD device lists.
> > > > > > > > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > > > > > > > to incorrectly fail. Further, it turns out that we
> > > > > > > > > want to utilize this flag to make a determination that
> > > > > > > > > there is actually a phy at this location and we should
> > > > > > > > > be accessing it using c22.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jeremy Linton <jeremy.linton@xxxxxxx>
> > > > > > > > > ---
> > > > > > > > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > > > > > > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > > > > > > > index f0761fa5e40b..2d677490ecab 100644
> > > > > > > > > --- a/drivers/net/phy/phy_device.c
> > > > > > > > > +++ b/drivers/net/phy/phy_device.c
> > > > > > > > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > > > > > > > return -EIO;
> > > > > > > > > *devices_in_package |= phy_reg;
> > > > > > > > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > > > - *devices_in_package &= ~BIT(0);
> > > > > > > > > -
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > int i;
> > > > > > > > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > > > > > > > u32 *devs = &c45_ids->devices_in_package;
> > > > > > > > > + bool c22_present = false;
> > > > > > > > > + bool valid_id = false;
> > > > > > > > > /* Find first non-zero Devices In package. Device zero is reserved
> > > > > > > > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > > > > > > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > > > + c22_present = *devs & BIT(0);
> > > > > > > > > + *devs &= ~BIT(0);
> > > > > > > > > +
> > > > > > > > > /* Now probe Device Identifiers for each device present. */
> > > > > > > > > for (i = 1; i < num_ids; i++) {
> > > > > > > > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > > > > > > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > > > > > > > if (ret < 0)
> > > > > > > > > return ret;
> > > > > > > > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > > > > > > > + valid_id = true;
> > > > > > > >
> > > > > > > > Here you are using your "devices in package" validator to validate the
> > > > > > > > PHY ID value. One of the things it does is mask this value with
> > > > > > > > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > > > > > > > looks completely wrong.
> > > > > > >
> > > > > > > I think in this case I was just using it like the comment in
> > > > > > > get_phy_device() "if the phy_id is mostly F's, there is no device here".
> > > > > > >
> > > > > > > My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
> > > > > > > that seem to indicate "bus ok, phy didn't respond".
> > > > > > >
> > > > > > > I just checked the OUI registration, and while there are a couple OUI's
> > > > > > > registered that have a number of FFF's in them, none of those cases seems to
> > > > > > > overlap sufficiently to cause this to throw them out. Plus a phy would also
> > > > > > > have to have model+revision set to 'F's. So while might be possible, if
> > > > > > > unlikely, at the moment I think the OUI registration keeps this from being a
> > > > > > > problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> > > > > > > guarantees that the field cannot be all '1's due to the OUI having X & M
> > > > > > > bits cleared. It sort of looks like the mapping is trying to lose those
> > > > > > > bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> > > > > > > just read it three times cause it didn't make any sense).
> > > > > >
> > > > > > I should also note that we have at least one supported PHY where one
> > > > > > of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
> > > > > > odd numbered registers in one of the vendor MMDs for addresses 0
> > > > > > through 0xefff - which has a bit set in the devices-in-package.
> > > > > >
> > > > > > It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
> > > > > > devices-in-package bit is clear in most of the valid MMDs, so we
> > > > > > shouldn't touch it.
> > > > > >
> > > > > > These reveal the problem of randomly probing MMDs - they can return
> > > > > > unexpected values and not be as well behaved as we would like them to
> > > > > > be. Using register 8 to detect presence may be beneficial, but that
> > > > > > may also introduce problems as we haven't used that before (and we
> > > > > > don't know whether any PHY that wrong.) I know at least the 88x3310
> > > > > > gets it right for all except the vendor MMDs, where the low addresses
> > > > > > appear non-confromant to the 802.3 specs. Both vendor MMDs are
> > > > > > definitely implemented, just not with anything conforming to 802.3.
> > > > >
> > > > > Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> > > > > probe out correctly because it doesn't respond to the ieee defined
> > > > > registers. I think at this point, there really isn't anything we can do
> > > > > about that unless we involve the (ACPI) firmware in currently nonstandard
> > > > > behaviors.
> > > > >
> > > > > So, my goals here have been to first, not break anything, and then do a
> > > > > slightly better job finding phy's that are (mostly?) responding correctly to
> > > > > the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> > > > > have IEEE conformant phy's you should be ok". So, for your example phy, I
> > > > > guess the immediate answer is "use DT" or "find a conformant phy", or even
> > > > > "abstract it in the firmware and use a mailbox interface".
> > > >
> > > > You haven't understood. The PHY does conform for most of the MMDs,
> > > > but there are a number that do not conform.
> > >
> > > Probably...
> > >
> > > Except that i'm not sure how that is a problem at the moment, its still
> > > going to trigger as a found phy, and walk the same mmd list as before
> > > requesting drivers. Those drivers haven't changed their behavior so where is
> > > the problem? If there is a problem its in 7/11 where things are getting
> > > kicked due to seemingly invalid Ids.
> > >
> > > The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
> > > won't get an ID or a MMD list from that (before or after).
> >
> > I think I've just flattened that argument in my immediately preceding
> > reply on the Cortina situation; I think you've grossly misread that
> > through not fully researching the history and then finding the
> > existing users.
> >
> > There is no bug that you are fixing from what I can see.
>
> One of us is missing something,
>
> The "cortina" solution is broken in the current kernel. That is because
> lines 726-742 are dead code due to line 693.
>
> I believe I've understood the problem there, and corrected it in this set
> along with a few others, but its distinctly possible that isn't true.

The code you refer to above is NOT used on the platforms that I have
identified use the Cortina PHY. If this code is not used, it has not
caused any issue, and there is no breakage due to the change you are
referring to.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up