On Wed, Mar 20, 2024 at 03:25:54PM +0000, John Ernberg wrote:
Hi Russel,
Growl. Hi Peter.
What we really want is the PHY to be suspended on suspend to RAM
regardless of us having had an initial link up or not.
So what you're asking is for the PHY to be suspended when the system
is entering suspend, which is a long time after the system booted and
thus phy_probe() was called, and could be some time before the system
resumes.
I'm not sure what the relevance is of phy_probe() that was brought up
previously then.
This worked prior to 4c0d2e96ba05 ("net: phy: consider that suspend2ram
may cut
off PHY power") which was added in Linux 5.11, and 557d5dc83f68 ("net:
fec: use
mac-managed PHY PM") which was added in Linux 5.12.
Looking at the former commit, that looks to me like it is only
affecting the resume paths, not the suspend paths, so wouldn't have
any impact itself on what happens when suspend happens.
The latter commit states that it is a work around for an issue with a
particular PHY. What happens if you revert just this commit, does your
problem then go away?
Also, please clarify. It seems that you are reporting a regression -
it used to work for you prior to 557d5dc83f68, but 557d5dc83f68 stops
it working for you?
Since FEC requires mac_managed_pm the generic PM suspend-resume paths
are not
taken. The resume sequencing with generic PM has been broken with the
FEC since
generic PM of the mdio bus was added, as the FEC will do phy_start()
(via FEC
resume) and then generic PM runs phy_init_hw() via mdio bus resume
(previously:
less damaging phy_resume()) due to how the FEC IP block works.
That suggests that even with 557d5dc83f68 reverted, it's broken.
Digging into the history, what you're referring to dates from January
2016, so are you reporting a regression that occured 8 _years_ ago,
at which point I'd question why it's taken 8 years.
Given the time that has passed, I don't think reverting commits is
a sane approach. Quite what the right solution is though, I'm not
sure.
From the description and the commits pointed to, I just don't see
that there is anything that could've changed with respect to the first
boot - if that has changed, then I think more research into what caused
it is needed.
If it's the subsequent state after a suspend-resume cycle, then yes,
I would agree that its possible that these changes broke this for you.
Would clearing ndev->phydev->mac_managed_pm just before
phy_disconnect() in fec_enet_close() fix it for you, so the suspend/
resume paths for the PHY get used when the network interface is down?
Maybe, however, that's something that should happen in any case inside
phylib on phy_disconnect() as a matter of course, since the PHY will
at that point be no longer under the control of the network driver for
PM purposes. Could you give this idea a try please?