Re: [PATCH net-next v2 7/9] net: phy: introduce ethtool_phy_ops to get and set phy configuration
From: Maxime Chevallier
Date: Tue Oct 08 2024 - 10:58:04 EST
On Tue, 8 Oct 2024 15:00:53 +0200
Andrew Lunn <andrew@xxxxxxx> wrote:
> > > So you have at least regulators under Linux control? Is that what you
> > > mean by power down? Pulling the plug and putting it back again is
> > > somewhat different to isolation. All its state is going to be lost,
> > > meaning phylib needs to completely initialise it again. Or can you
> > > hide this using PM? Just suspend/resume it?
> >
> > Ah no, I wasn't referring to regulators but rather the BMCR PDOWN bit to
> > just shut the PHY down, as in suspend.
>
> Ah! I wounder what 802.3 says about PDOWN? Does it say anything about
> it being equivalent to ISOLATE? That the pins go HI-Z? Are we talking
> about something semi-reliable, or something which just happens to work
> for this PHY?
The spec doesn't say anything about hi-z on the MII for power-down, it
simply says (22.2.4.1.5 Power down) :
"During the transition to the power-down state and while
in the power-down state, the PHY shall not generate spurious
signals on the MII or GMII"
So my best guess is that it just happens to work for this PHY. It won't
work for serdes links for the same reasons as the isolate mode I guess,
reflections would make it too unstable ?
> > Indeed the state is lost. The way I'm supporting this is :
> >
> > - If one PHY has the link, it keeps it until link-down
> > - When link-down, I round-robin between the 2 phys:
> >
> > - Attach the PHY to the netdev
> > - See if it can establish link and negotiate with LP
> > - If there's nothing after a given period ( 2 seconds default ), then
> > I detach the PHY, attach the other one, and start again, until one of
> > them has link.
>
> This sounds pretty invasive to the MAC driver. I don't think you need
> to attach/detach each cycle, since you don't need to send/receive any
> packets. You could hide this all in phylib. But that should be
> considered as part of the bigger picture.
Sure, that's what I came-up with so far but that's indeed an implem
problem.
>
> I assume it is not actually 2 seconds, but some random number in the
> range 1-3 seconds, so when both ends are searching they do eventually
> find each other?
Oleksji pointed that out to me at LPC, that makes sense indeed.
>
> > > That explains the hardware, but what are the use cases? How did the
> > > hardware designer envision this hardware being used?
> >
> > The use-case is link redundancy, if one PHY loses the link, we hope
> > that we still have link on the other one and switchover. This is one of
> > the things I discussed at netdev 0x17.
>
> > > If you need to power the PHY off, you cannot have dynamic behaviour
> > > where the first to have link wins. But if you can have the media side
> > > functional, you can do some dynamic behaviours.
> >
> > True.
> >
> > > Although, is it wise
> > > for the link to come up, yet to be functionally dead because it has no
> > > MAC connected?
> >
> > Good point. What would you think ? I already deal with the identified
> > issue which is that both PHYs are link-up with LP, both connected to
> > the same switch. When we switch between the active PHYs, we send a
> > gratuitous ARP on the new PHY to refresh the switch's FDB.
>
> It seems odd to me you have redundant cables going to one switch? I
> would have the cables going in opposite directions, to two different
> switches, and have the switches in at a minimum a ring, or ideally a
> mesh.
>
> I don't think the ARP is necessary. The link peer switch should flush
> its tables when the link goes down. But switches further away don't
> see such link events, yet they learn about the new location of the
> host. I would also expect the host sees a loss of carrier and then the
> carrier restored, which probably flushes all its tables, so it is
> going to ARP anyway.
While I would agree with you on the theory, while testing we discovered
that sending that ARP was necessary to reliably update the switch's
tables :/
This is also what bonding does in active-backup mode.
> >
> > Do you see that as being an issue, having the LP see link-up when the
> > link cannot actually convey data ? Besides the energy detect feature
> > you mention, I don't see what other options we can have unfortunately :(
>
> Maybe see what 802.3 says about advertising with no link
> modes. Autoneg should complete, in that the peers exchange messages,
> but the result of the autoneg is that they have no common modes, so
> the link won't come up. Is it clearly defined what should happen in
> this case? But we are in a corner case, similar to ISOLATE, which i
> guess rarely gets tested, so is often broken. I would guess power
> detection would be more reliable when implemented.
I'll need to perform further tests on that, I haven't looked into
energy detect. Let me take a look :)
> > > There are some Marvell Switches which support both internal Copper
> > > PHYs and a SERDES port. The hardware allows first to get link to have
> > > a functional MAC. But in Linux we have not supported that, and we
> > > leave the unused part down so it does not get link.
> >
> > My plan is to support these as well. For the end-user, it makes no
> > difference wether the HW internally has 2 PHYs each with one port, or 1
> > phy with 2 ports. So to me, if we want to support phy_mux, we should
> > also support the case you mention above. I have some code to support
> > this, but that's the part where I'm still getting things ironed-out,
> > this is pretty tricky to represent that properly, especially in DT.
> >
> > >
> > > Maybe we actually want energy detect, not link, to decide which PHY
> > > should get the MAC? But i have no real idea what you can do with
> > > energy detect, and it would also mean building out the read_status()
> > > call to report additional things, etc.
> >
> > Note that I'm trying to support a bigger set of use-cases besides the
> > pure 2-PHY setup. One being that we have a MUX within the SoC on the
> > SERDES lanes, allowing to steer the MII interface between a PHY and an
> > SFP bus (Turris Omnia has such a setup). Is it possible to have an
> > equivalent "energy detect" on all kinds of SFPs ?
>
> The LOS pin, which indicates if there is light entering the SFP.
>
> > As a note, I do see that both Russell and you may think you're being
> > "drip-fed" (I learned that term today) information, that's not my
> > intent at all, I wasn't expecting this discussion now, sorry about that.
>
> It is a difficult set of problems, and you are addressing it from the
> very niche end first using mechanisms which i expect are not reliably
> implemented. So we are going to ask lots of questions.
There's absolutely no problem with that :)
> You probably would of got less questions if you have started with the
> use cases for the Turris Omnia and Marvell Ethernet switch, which are
> more mainstream, and then extended it with your niche device. But i
> can understand this order, you probably have a customer with this
> niche device...
Oh but I plan to add support for the marvell switch, mcbin, and turris
first, as these boards are somewhat easily accessible and allows
converging towards a proper kAPI for that without relying on the boards
only I and a few other folks have.
That's another can of worms though :)
Maxime