Aw: Re: [RFC PATCH net-next v3 3/8] net: pcs: pcs-mtk-lynxi: add platform driver for MT7988
From: Frank Wunderlich
Date: Fri Oct 04 2024 - 10:18:47 EST
Hi netdev,
we are still stale in adding pcs driver here...can you please show us the way to go??
daniel posted another attempt but this seems also stale.
https://patchwork.kernel.org/project/netdevbpf/patch/ba4e359584a6b3bc4b3470822c42186d5b0856f9.1721910728.git.daniel@xxxxxxxxxxxxxx/
@Angelo/Matthias, can you help solving this?
left full quote below....
regards Frank
> Gesendet: Mittwoch, 10. Juli 2024 um 13:17 Uhr
> Von: "Frank Wunderlich" <frank-w@xxxxxxxxxxxxxxx>
> An: "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx>
> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>, "Eric Dumazet" <edumazet@xxxxxxxxxx>, "Jakub Kicinski" <kuba@xxxxxxxxxx>, "Paolo Abeni" <pabeni@xxxxxxxxxx>, "Rob Herring" <robh+dt@xxxxxxxxxx>, "Krzysztof Kozlowski" <krzysztof.kozlowski+dt@xxxxxxxxxx>, "Conor Dooley" <conor+dt@xxxxxxxxxx>, "Chunfeng Yun" <chunfeng.yun@xxxxxxxxxxxx>, "Vinod Koul" <vkoul@xxxxxxxxxx>, "Kishon Vijay Abraham I" <kishon@xxxxxxxxxx>, "Felix Fietkau" <nbd@xxxxxxxx>, "John Crispin" <john@xxxxxxxxxxx>, "Sean Wang" <sean.wang@xxxxxxxxxxxx>, "Mark Lee" <Mark-MC.Lee@xxxxxxxxxxxx>, "Lorenzo Bianconi" <lorenzo@xxxxxxxxxx>, "Matthias Brugger" <matthias.bgg@xxxxxxxxx>, "AngeloGioacchino Del Regno" <angelogioacchino.delregno@xxxxxxxxxxxxx>, "Andrew Lunn" <andrew@xxxxxxx>, "Heiner Kallweit" <hkallweit1@xxxxxxxxx>, "Alexander Couzens" <lynxis@xxxxxxx>, "Qingfang Deng" <dqfext@xxxxxxxxx>, "SkyLake Huang" <SkyLake.Huang@xxxxxxxxxxxx>, "Philipp Zabel" <p.zabel@xxxxxxxxxxxxxx>, netdev@xxxxxxxxxxxxxxx, devicetree@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, linux-mediatek@xxxxxxxxxxxxxxxxxxx, linux-phy@xxxxxxxxxxxxxxxxxxx, "Daniel Golle" <daniel@xxxxxxxxxxxxxx>
> Betreff: Re: [RFC PATCH net-next v3 3/8] net: pcs: pcs-mtk-lynxi: add platform driver for MT7988
>
> Hi Russel / netdev
>
> how can we proceed here? development for mt7988/Bpi-R4 is stalled here
> because it is unclear what's the right way to go...
>
> We want to avoid much work adding a complete new framework for a "small"
> change (imho this affects all SFP-like sub-devices and their linking to
> phylink/pcs driver)...if this really the way to go then some boundary
> conditions will be helpful.
>
> regards Frank
>
> Am 22.04.24 um 18:24 schrieb Daniel Golle:
> > Hi Russell,
> > Hi netdev crew,
> >
> > I haven't received any reply for this email and am still waiting for
> > clarification regarding how inter-driver dependencies should be modelled
> > in that case of mtk_eth_soc. My search for good examples for that in the
> > kernel code was quite frustrating and all I've found are supposedly bugs
> > of that exact cathegory.
> >
> > Please see my questions mentioned in the previous email I've sent and
> > also a summary of suggested solutions inline below:
> >
> > On Wed, Feb 07, 2024 at 01:29:18AM +0000, Daniel Golle wrote:
> >> Hi Russell,
> >>
> >> sorry for the extended time it took me to get back to this patch and
> >> the comments you made for it. Understanding the complete scope of the
> >> problem took a while (plus there were holidays and other fun things),
> >> and also brought up further questions which I hope you can help me
> >> find good answers for, see below:
> >>
> >> On Wed, Dec 13, 2023 at 04:04:12PM +0000, Russell King (Oracle) wrote:
> >>> On Tue, Dec 12, 2023 at 03:47:18AM +0000, Daniel Golle wrote:
> >>>> Introduce a proper platform MFD driver for the LynxI (H)SGMII PCS which
> >>>> is going to initially be used for the MT7988 SoC.
> >>>>
> >>>> Signed-off-by: Daniel Golle <daniel@xxxxxxxxxxxxxx>
> >>> I made some specific suggestions about what I wanted to see for
> >>> "getting" PCS in the previous review, and I'm disappointed that this
> >>> patch set is still inventing its own solution.
> >>>
> >>>> +struct phylink_pcs *mtk_pcs_lynxi_get(struct device *dev, struct device_node *np)
> >>>> +{
> >>>> + struct platform_device *pdev;
> >>>> + struct mtk_pcs_lynxi *mpcs;
> >>>> +
> >>>> + if (!np)
> >>>> + return NULL;
> >>>> +
> >>>> + if (!of_device_is_available(np))
> >>>> + return ERR_PTR(-ENODEV);
> >>>> +
> >>>> + if (!of_match_node(mtk_pcs_lynxi_of_match, np))
> >>>> + return ERR_PTR(-EINVAL);
> >>>> +
> >>>> + pdev = of_find_device_by_node(np);
> >>>> + if (!pdev || !platform_get_drvdata(pdev)) {
> >>> This is racy - as I thought I described before, userspace can unbind
> >>> the device in one thread, while another thread is calling this
> >>> function. With just the right timing, this check succeeds, but...
> >>>
> >>>> + if (pdev)
> >>>> + put_device(&pdev->dev);
> >>>> + return ERR_PTR(-EPROBE_DEFER);
> >>>> + }
> >>>> +
> >>>> + mpcs = platform_get_drvdata(pdev);
> >>> mpcs ends up being read as NULL here. Even if you did manage to get a
> >>> valid pointer, "mpcs" being devm-alloced could be freed from under
> >>> you at this point...
> >>>
> >>>> + device_link_add(dev, mpcs->dev, DL_FLAG_AUTOREMOVE_CONSUMER);
> >>> resulting in this accessing memory which has been freed.
> >>>
> >>> The solution would be either to suppress the bind/unbind attributes
> >>> (provided the underlying struct device can't go away, which probably
> >>> also means ensuring the same of the MDIO bus. Aternatively, adding
> >>> a lock around the remove path and around the checking of
> >>> platform_get_drvdata() down to adding the device link would probably
> >>> solve it.
> >>>
> >>> However, I come back to my general point - this kind of stuff is
> >>> hairy. Do we want N different implementations of it in various drivers
> >>> with subtle bugs, or do we want _one_ implemenatation.
> >>>
> >>> If we go with the one implemenation approach, then we need to think
> >>> about whether we should be using device links or not. The problem
> >>> could be for network interfaces where one struct device is
> >>> associated with multiple network interfaces. Using device links has
> >>> the unfortunate side effect that if the PCS for one of those network
> >>> interfaces is removed, _all_ network interfaces disappear.
> >> I agree, esp. on this MT7988 removal of a PCS which may then not
> >> even be in use also impairs connectivity on the built-in gigE DSA
> >> switch. It would be nice to try to avoid this.
> >>
> >>> My original suggestion was to hook into phylink to cause that to
> >>> take the link down when an in-use PCS gets removed.
> >> I took a deep dive into how this could be done correctly and how
> >> similar things are done for other drivers. Apart from the PCS there
> >> often also is a muxing-PHY involved, eg. MediaTek's XFI T-PHY in this
> >> case (previously often called "pextp" for no apparent reason) or
> >> Marvell's comphy (mvebu-comphy).
> >>
> >> So let's try something simple on an already well-supported platform,
> >> I thought and grabbed Marvell Armada CN9131-DB running vanilla Linux,
> >> and it didn't even take some something racy, but plain:
> >>
> >> ip link set eth6 up
> >> cd /sys/bus/platform/drivers/mvebu-comphy
> >> echo f2120000.phy > unbind
> >> echo f4120000.phy > unbind
> >> echo f6120000.phy > unbind
> >> ip link set eth6 down
> >>
> >>
> >> That was enough. The result is a kernel crash, and the same should
> >> apply on popular platforms like the SolidRun MACCHIATOBin and other
> >> similar boards.
> >>
> >> So this gets me to think that there is a wider problem around
> >> non-phylink-managed resources which may disappear while in use by
> >> network drivers, and I guess that the same applies in many other
> >> places. I don't have a SATA drive connected to that Marvell board, but
> >> I can imagine what happens when suddenly removing the comphy instance
> >> in charge of the SATA link and then a subsequent SATA hotplug event
> >> happens or stuff like that...
> >>
> >> Anyway, back to network subsystem and phylink:
> >>
> >> Do you agree that we need a way to register (and unregister) PCS
> >> providers with phylink, so we don't need *_get_by_of_node implementations
> >> in each driver? If so, can you sketch out what the basic requirements
> >> for an acceptable solution would be?
> >>
> >> Also, do you agree that lack of handling of disappearing resources,
> >> such as PHYs (*not* network PHYs, but PHYs as in drivers/phy/*) or
> >> syscons, is already a problem right now which should be addressed?
> >>
> >> If you imagine phylink to take care of the life-cycle of all link-
> >> resources, what is vision about those things other than classic MDIO-
> >> connected PHYs?
> >>
> >> And well, of course, the easy fix for the example above would be:
> >> diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> >> index b0dd133665986..9517c96319595 100644
> >> --- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> >> +++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> >> @@ -1099,6 +1099,7 @@ static const struct of_device_id mvebu_comphy_of_match_table[] = {
> >> MODULE_DEVICE_TABLE(of, mvebu_comphy_of_match_table);
> >>
> >> static struct platform_driver mvebu_comphy_driver = {
> >> + .suppress_bind_attrs = true,
> >> .probe = mvebu_comphy_probe,
> >> .driver = {
> >> .name = "mvebu-comphy",
> >>
> >> But that should then apply to every single driver in drivers/phy/...
> >>
> > My remaining questions are:
> > - Is it ok to just use .suppress_bind_attrs = true for PCS and PHY
> > drivers to avoid having to care out hot-removal?
> > Those components are anyway built-into the SoC so removing them
> > is physically not possible.
> >
> > - In case of driver removal (rmmod -f) imho using a device-link would
> > be sufficient to prevent the worst. However, we would have to live
> > with the all-or-nothing nature of that approach, ie. once e.g. the
> > USXGMII driver is being removed, *all* Ethernet interfaces would
> > disappear with it (even those not actually using USXGMII).
> >
> > If the solutions mentioned above do not sound agreeable, please suggest
> > how a good solution would look like, ideally also share an example of
> > any driver in the kernel where this is done in the way you would like
> > to have things done.
> >
> >>>> +
> >>>> + return &mpcs->pcs;
> >>>> +}
> >>>> +EXPORT_SYMBOL(mtk_pcs_lynxi_get);
> >>>> +
> >>>> +void mtk_pcs_lynxi_put(struct phylink_pcs *pcs)
> >>>> +{
> >>>> + struct mtk_pcs_lynxi *cur, *mpcs = NULL;
> >>>> +
> >>>> + if (!pcs)
> >>>> + return;
> >>>> +
> >>>> + mutex_lock(&instance_mutex);
> >>>> + list_for_each_entry(cur, &mtk_pcs_lynxi_instances, node)
> >>>> + if (pcs == &cur->pcs) {
> >>>> + mpcs = cur;
> >>>> + break;
> >>>> + }
> >>>> + mutex_unlock(&instance_mutex);
> >>> I don't see what this loop gains us, other than checking that the "pcs"
> >>> is still on the list and hasn't already been removed. If that is all
> >>> that this is about, then I would suggest:
> >>>
> >>> bool found = false;
> >>>
> >>> if (!pcs)
> >>> return;
> >>>
> >>> mpcs = pcs_to_mtk_pcs_lynxi(pcs);
> >>> mutex_lock(&instance_mutex);
> >>> list_for_each_entry(cur, &mtk_pcs_lynxi_instances, node)
> >>> if (cur == mpcs) {
> >>> found = true;
> >>> break;
> >>> }
> >>> mutex_unlock(&instance_mutex);
> >>>
> >>> if (WARN_ON(!found))
> >>> return;
> >>>
> >>> which makes it more obvious why this exists.
> >> The idea was not only to make sure it hasn't been removed, but also
> >> to be sure that what ever the private data pointer points to has
> >> actually been created by that very driver.
> >>
> >> But yes, doing it in the way you suggest will work in the same way,
> >> just when having my idea in mind it looks more fishy to use
> >> pcs_to_mtk_pcs_lynxi() before we are 100% sure that what we dealing
> >> with has previously been created by this driver.
> >>
> >>
> >> Cheers
> >>
> >>
> >> Daniel
> >>
>