Re: [PATCH v3 3/5] net: Let the active time stamping layer be selectable.

From: Michael Walle
Date: Fri Mar 10 2023 - 08:34:16 EST


> > > From previous discussions, I believe that a device tree property was
> > > added in order to prevent perceived performance regressions when
> > > timestamping support is added to a PHY driver, correct?
> >
> > Yes, i.e. to select the default and better timestamp on a board.
>
> Is there a way to unambiguously determine the "better" timestamping on
> a board?
>
> Is it plausible that over time, when PTP timestamping matures and,
> for example, MDIO devices get support for PTP_SYS_OFFSET_EXTENDED
> (an attempt was here: https://lkml.org/lkml/2019/8/16/638), the
> relationship between PTP clock qualities changes, and so does the
> preference change?
>
> > > I have a dumb question: if updating the device trees is needed in order
> > > to prevent these behavior changes, then how is the regression problem
> > > addressed for those device trees which don't contain this new property
> > > (all device trees)?
> >
> > On that case there is not really solution,
>
> If it's not really a solution, then doesn't this fail at its primary
> purpose of preventing regressions?
>
> > but be aware that CONFIG_PHY_TIMESTAMPING need to be activated to
> > allow timestamping on the PHY. Currently in mainline only few (3)
> > defconfig have it enabled so it is really not spread,
>
> Do distribution kernels use the defconfigs from the kernel, or do they
> just enable as many options that sound good as possible?
>
> > maybe I could add more documentation to prevent further regression
> > issue when adding support of timestamp to a PHY driver.
>
> My opinion is that either the problem was not correctly identified,
> or the proposed solution does not address that problem.
>
> What I believe is the problem is that adding support for PHY
> timestamping
> to a PHY driver will cause a behavior change for existing systems which
> are deployed with that PHY.
>
> If I had a multi-port NIC where all ports share the same PHC, I would
> want to create a boundary clock with it. I can do that just fine when
> using MAC timestamping. But assume someone adds support for PHY
> timestamping and the kernel switches to using PHY timestamps by
> default.
> Now I need to keep in sync the PHCs of the PHYs, something which was
> implicit before (all ports shared the same PHC). I have done nothing
> incorrectly, yet my deployment doesn't work anymore. This is just an
> example. It doesn't sound like a good idea in general for new features
> to cause a behavior change by default.
>
> Having identified that as the problem, I guess the solution should be
> to stop doing that (and even though a PHY driver supports timestamping,
> keep using the MAC timestamping by default).
>
> There is a slight inconvenience caused by the fact that there are
> already PHY drivers using PHY timestamping, and those may have been
> introduced into deployments with PHY timestamping. We cannot change the
> default behavior for those either. There are 5 such PHY drivers today
> (I've grepped for mii_timestamper in drivers/net/phy).
>
> I would suggest that the kernel implements a short whitelist of 5
> entries containing PHY driver names, which are compared against
> netdev->phydev->drv->name (with the appropriate NULL pointer checks).
> Matches will default to PHY timestamping. Otherwise, the new default
> will be to keep the behavior as if PHY timestamping doesn't exist
> (MAC still provides the timestamps), and the user needs to select the
> PHY as the timestamping source explicitly.
>
> Thoughts?

While I agree in principle (I have suggested to make MAC timestamping
the default before), I see a problem with the recent LAN8814 PHY
timestamping support, which will likely be released with 6.3. That
would now switch the timestamping to PHY timestamping for our board
(arch/arm/boot/dts/lan966x-kontron-kswitch-d10-mmt-8g.dts). I could
argue that is a regression for our board iff NETWORK_PHY_TIMESTAMPING
is enabled. Honestly, I don't know how to proceed here and haven't
tried to replicate the regression due to limited time. Assuming,
that I can show it is a regression, what would be the solution then,
reverting the commit? Horatiu, any ideas?

I don't think reverting the commit is the best approach.

I didn't expect any other answer from the author of the patch ;)

Because this
will block adding any timestamp support to any of the existing PHYs.

Right, but if I understand it correctly, that is what has happend to
the Marvell PHY PTP support
https://lore.kernel.org/netdev/Y%2FzKJUHUhEgXjKFG@xxxxxxxxxxxxxxxxxxxxx/

(Or it was NAK'ed before it could even get in. Maybe I'm to blame here,
but I have just so much time to follow all the mainline development).

Maybe a better solution is to enable or disable NETWORK_PHY_TIMESTAMPING
depending where you want to do the timestamp.

No, that is not how this can work. I agree with Vladimir, that in
general, you have no control which kernel options are enabled, see
distros.

I digress from the original problem a bit. But if there would be such
a whitelist, I'd propose that it won't contain the lan8814 driver.

I don't have anything against having a whitelist the PHY driver names.

Yeah, but my problem right now is, that if this discussion won't find
any good solution, the lan8814 phy timestamping will find it's way
into an official kernel and then it is really hard to undo things.

So, I'd really prefer to *first* have a discussion how to proceed
with the PHY timestamping and then add the lan8814 support, so
existing boards don't show a regressions.

-michael