Re: [RFC PATCH net-next] net: dsa: sja1105: let phylink help with the replay of link callbacks
From: Russell King (Oracle)
Date: Thu Dec 12 2024 - 11:04:31 EST
On Thu, Oct 03, 2024 at 05:07:53PM +0300, Vladimir Oltean wrote:
> sja1105_static_config_reload() changes major settings in the switch and
> it requires a reset. A use case is to change things like Qdiscs (but see
> sja1105_reset_reasons[] for full list) while PTP synchronization is
> running, and the servo loop must not exit the locked state (s2).
> Therefore, stopping and restarting the phylink instances of all ports is
> not desirable, because that also stops the phylib state machine, and
> retriggers a seconds-long auto-negotiation process that breaks PTP.
However:
> ptp4l[54.552]: master offset 5 s2 freq -931 path delay 764
> ptp4l[55.551]: master offset 22 s2 freq -913 path delay 764
> ptp4l[56.551]: master offset 13 s2 freq -915 path delay 765
> ptp4l[57.552]: master offset 5 s2 freq -919 path delay 765
> ptp4l[58.553]: master offset 13 s2 freq -910 path delay 765
> ptp4l[59.553]: master offset 13 s2 freq -906 path delay 765
> ptp4l[60.553]: master offset 6 s2 freq -909 path delay 765
> ptp4l[61.553]: master offset 6 s2 freq -907 path delay 765
> ptp4l[62.553]: master offset 6 s2 freq -906 path delay 765
> ptp4l[63.553]: master offset 14 s2 freq -896 path delay 765
> $ ip link set br0 type bridge vlan_filtering 1
> [ 63.983283] sja1105 spi2.0 sw0p0: Link is Down
> [ 63.991913] sja1105 spi2.0: Link is Down
> [ 64.009784] sja1105 spi2.0: Reset switch and programmed static config. Reason: VLAN filtering
> [ 64.020217] sja1105 spi2.0 sw0p0: Link is Up - 1Gbps/Full - flow control off
> [ 64.030683] sja1105 spi2.0: Link is Up - 1Gbps/Full - flow control off
> ptp4l[64.554]: master offset 7397 s2 freq +6491 path delay 765
> ptp4l[65.554]: master offset 38 s2 freq +1352 path delay 765
> ptp4l[66.554]: master offset -2225 s2 freq -900 path delay 764
> ptp4l[67.555]: master offset -2226 s2 freq -1569 path delay 765
> ptp4l[68.555]: master offset -1553 s2 freq -1563 path delay 765
> ptp4l[69.555]: master offset -865 s2 freq -1341 path delay 765
> ptp4l[70.555]: master offset -401 s2 freq -1137 path delay 765
> ptp4l[71.556]: master offset -145 s2 freq -1001 path delay 765
doesn't this change in offset and frequency indicate that the PTP clock
was still disrupted, and needed to be re-synchronised? If it was
unaffected, then I would have expected the offset and frequency to
remain similar to before the reset happened.
Nevertheless...
> @@ -1551,7 +1552,8 @@ static void phylink_resolve(struct work_struct *w)
> }
>
> if (mac_config) {
> - if (link_state.interface != pl->link_config.interface) {
> + if (link_state.interface != pl->link_config.interface ||
> + pl->force_major_config) {
> /* The interface has changed, force the link down and
> * then reconfigure.
> */
> @@ -1561,6 +1563,7 @@ static void phylink_resolve(struct work_struct *w)
> }
> phylink_major_config(pl, false, &link_state);
> pl->link_config.interface = link_state.interface;
> + pl->force_major_config = false;
> }
> }
This will delay the major config until the link comes up, as mac_config
only gets set true for fixed-link and PHY when the link is up. For
inband mode, things get less certain, because mac_config will only be
true if there is a PHY present and the PHY link was up. Otherwise,
inband leaves mac_config false, and thus if force_major_config was
true, that would persist indefinitely.
> +/**
> + * phylink_replay_link_begin() - begin replay of link callbacks for driver
> + * which loses state
> + * @pl: a pointer to a &struct phylink returned from phylink_create()
> + *
> + * Helper for MAC drivers which may perform a destructive reset at runtime.
> + * Both the own driver's mac_link_down() method is called, as well as the
> + * pcs_link_down() method of the split PCS (if any).
> + *
> + * This is similar to phylink_stop(), except it does not alter the state of
> + * the phylib PHY (it is assumed that it is not affected by the MAC destructive
> + * reset).
> + */
> +void phylink_replay_link_begin(struct phylink *pl)
> +{
> + ASSERT_RTNL();
> +
> + phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
I would prefer this used a different disable flag, so that...
> +}
> +EXPORT_SYMBOL_GPL(phylink_replay_link_begin);
> +
> +/**
> + * phylink_replay_link_end() - end replay of link callbacks for driver
> + * which lost state
> + * @pl: a pointer to a &struct phylink returned from phylink_create()
> + *
> + * Helper for MAC drivers which may perform a destructive reset at runtime.
> + * Both the own driver's mac_config() and mac_link_up() methods, as well as the
> + * pcs_config() and pcs_link_up() method of the split PCS (if any), are called.
> + *
> + * This is similar to phylink_start(), except it does not alter the state of
> + * the phylib PHY.
> + *
> + * One must call this method only within the same rtnl_lock() critical section
> + * as a previous phylink_replay_link_start().
> + */
> +void phylink_replay_link_end(struct phylink *pl)
> +{
> + ASSERT_RTNL();
> +
> + pl->force_major_config = true;
> + phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
this can check that phylink_replay_link_begin() was previously called
to catch programming errors. There shouldn't be any conflict with
phylink_start()/phylink_stop() since the RTNL is held, but I think
its still worth checking that phylink_replay_link_begin() was
indeed called previously.
Other than those points, I think for sja1105 this is a better approach,
and as it's lightweight in phylink, I don't think having this will add
much maintenance burden, so I'm happy with the approach.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!