Re: [PATCH v2 1/2] modpost: file2alias: fixup mdio alias garbled code in modules.alias

From: Russell King (Oracle)
Date: Tue Nov 30 2021 - 06:47:42 EST


On Fri, Nov 26, 2021 at 11:21:03AM +0100, Heiner Kallweit wrote:
> On 26.11.2021 10:45, Yinbo Zhu wrote:
> > After module compilation, module alias mechanism will generate a ugly
> > mdio modules alias configure if ethernet phy was selected, this patch
> > is to fixup mdio alias garbled code.
> >
> > In addition, that ugly alias configure will cause ethernet phy module
> > doens't match udev, phy module auto-load is fail, but add this patch
> > that it is well mdio driver alias configure match phy device uevent.
> >
> I think Andrew asked you for an example already.
> For which PHY's the driver isn't auto-loaded?
>
> In addition your commit descriptions are hard to read, especially the
> one for patch 2. Could you please try to change them to proper English?
> Not being a native speaker myself ..

Let's clear this up. PHY module loading is quite different - it does
_not_ use MODALIAS nor does it use the usual udev approach.

The modalias strings use aliases of the form "mdio:<semi-binary-string>"
with "?" used as a wildcard for each bit not in the mask. This is an
entirely appropriate scheme to use, as it allows matching an ID with
an arbitary mask. There is nothing wrong with this format - it may be
a bit on the long side, but it is an entirely valid solution.

The kernel has never generated a MODALIAS of this form, which is fine,
because we don't use MODALIAS or the uevent/udev approach to loading
the modules.

We instead use phy_request_driver_module() at PHY device creation time
to explicitly request modprobe to load a module of the form
"mdio:<binary-id>" which we know works (I have had the marvell10g and
bcm84881 modules autoloaded as a result of inserting SFPs.)

However, this won't work for PHY devices created _before_ the kernel
has mounted the rootfs, whether or not they end up being used. So,
every PHY mentioned in DT will be created before the rootfs is mounted,
and none of these PHYs will have their modules loaded.

I believe this is the root cause of Yinbo Zhu's issue.

However, changing the modalias format that we use is not a solution -
it _will_ cause DSA module loading to break. We've been here with the
SPI subsystem, where a patch was merged to change the modalias format
allegedly to fix loading of one or two modules, resulting in the
spi-nor driver failing to load (as it had done for years) - and the
resulting change was reverted and the revert backported to all the
stable trees. It created quite a mess. Linus has always been very clear
that if fixing one issue causes regressions, then the fix is wrong and
needs to be reverted. This is exactly what happened in the case of SPI.

This teaches us a lesson: changes to any modalias scheme that has been
in use for years need _extremely_ careful consideration and thorough
testing as they risk causing regressions. Without that, such changes
can result in difficult decisions where no matter what decision is
made, some breakage occurs as a result of sorting out the resulting
mess from not having considered the change carefully enough. It is far
better to avoid boxing oneself into a corner.

We can see that Yinbo Zhu's changes to fix his issue will cause
regressions with DSA, so it is simply an unacceptable fix. Reposting
the same code will _never_ change that fact. So please, Yinbo Zhu, stop
reposting your change. It is provably regression-creating and as such
will never be accepted. You also seem to be resistant to feedback -
I've asked you to separate out the "mdio_bus" change but you still have
not in your version 3 posted today. Therefore, I will assume that you
won't read this email, and in future if I see those patches again, I
will reply with a one-line "NAK" and a reference to this email.

We instead need a different approach to solving this issue. What that
approach is, I'm not sure right now - the tooling is setup to only
permit one MODALIAS published by the kernel, so we can't publish both
a DT based modalias and a mdio: based modalias together. It's one or
the other.

What we _could_ do is review all device trees and PHY drivers to see
whether DT modaliases are ever used for module loading. If they aren't,
then we _could_ make the modalias published by the kernel conditional
on the type of mdio device - continue with the DT approach for non-PHY
devices, and switch to the mdio: scheme for PHY devices. I repeat, this
can only happen if no PHY drivers match using the DT scheme, otherwise
making this change _will_ cause a regression.

The alternative is we simply declare that udev based module auto-loading
of PHY drivers required for PHYs in DT is simply not supported, and is
something we are unable to support. For something like root-NFS or IP
autoconfiguration by the kernel, that is already the case - the PHY
driver modules _must_ be built-in to the kernel in just the same way as
the network driver modules must be.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!