Re: [RFC PATCH net-next] net: dsa: sja1105: let phylink help with the replay of link callbacks

From: Vladimir Oltean
Date: Thu Dec 12 2024 - 12:27:44 EST


On Thu, Dec 12, 2024 at 03:56:37PM +0000, Russell King (Oracle) wrote:
> On Thu, Oct 03, 2024 at 05:07:53PM +0300, Vladimir Oltean wrote:
> > sja1105_static_config_reload() changes major settings in the switch and
> > it requires a reset. A use case is to change things like Qdiscs (but see
> > sja1105_reset_reasons[] for full list) while PTP synchronization is
> > running, and the servo loop must not exit the locked state (s2).
> > Therefore, stopping and restarting the phylink instances of all ports is
> > not desirable, because that also stops the phylib state machine, and
> > retriggers a seconds-long auto-negotiation process that breaks PTP.
>
> However:
>
> > ptp4l[63.553]: master offset 14 s2 freq -896 path delay 765
> > $ ip link set br0 type bridge vlan_filtering 1
> > [ 63.983283] sja1105 spi2.0 sw0p0: Link is Down
> > [ 63.991913] sja1105 spi2.0: Link is Down
> > [ 64.009784] sja1105 spi2.0: Reset switch and programmed static config. Reason: VLAN filtering
> > [ 64.020217] sja1105 spi2.0 sw0p0: Link is Up - 1Gbps/Full - flow control off
> > [ 64.030683] sja1105 spi2.0: Link is Up - 1Gbps/Full - flow control off
> > ptp4l[64.554]: master offset 7397 s2 freq +6491 path delay 765
> > ptp4l[65.554]: master offset 38 s2 freq +1352 path delay 765
> > ptp4l[66.554]: master offset -2225 s2 freq -900 path delay 764
>
> doesn't this change in offset and frequency indicate that the PTP clock
> was still disrupted, and needed to be re-synchronised? If it was
> unaffected, then I would have expected the offset and frequency to
> remain similar to before the reset happened.

Regarding expectations: I cannot avoid having _some_ disruption, as the
hardware doesn't have proper assist for this operation. It's an automotive
switch intended to be used with a static configuration programmed once.
Offloading the dynamic Linux network stack is somewhat outside of its
design intentions.

The driver measures the time elapsed, in the CLOCK_REALTIME domain,
during the switch reset, and then reprograms the switch PTP clock with
the value pre-reset plus that elapsed time. It's still better than
leaving the PTP time in January 1970, which would require the user space
PTP stack to exit the "servo locked" state and perform a clock step.

I also haven't actually shown the post-reset output when it has reached
the steady state again, just a few seconds worth of ouput, that's why
the frequency adjustment is not yet equal to the previous value.

There are ways to further reduce the convergence time in real life systems,
which I didn't bother to apply here, like synchronize CLOCK_REALTIME to
the switch PHC (to better approximate the leap missed during switch reset),
or increasing the Sync packet interval (SJA1105 is frequently used with
gPTP which has a Sync interval of 125 ms, here I tested with 1 s).
It still really doesn't compare at all to the disruption caused by
alternatives such as dropping the link in the PHY, or not restoring the
PTP time at all.

> Nevertheless...
>
> > @@ -1551,7 +1552,8 @@ static void phylink_resolve(struct work_struct *w)
> > }
> >
> > if (mac_config) {
> > - if (link_state.interface != pl->link_config.interface) {
> > + if (link_state.interface != pl->link_config.interface ||
> > + pl->force_major_config) {
> > /* The interface has changed, force the link down and
> > * then reconfigure.
> > */
> > @@ -1561,6 +1563,7 @@ static void phylink_resolve(struct work_struct *w)
> > }
> > phylink_major_config(pl, false, &link_state);
> > pl->link_config.interface = link_state.interface;
> > + pl->force_major_config = false;
> > }
> > }
>
> This will delay the major config until the link comes up, as mac_config
> only gets set true for fixed-link and PHY when the link is up. For
> inband mode, things get less certain, because mac_config will only be
> true if there is a PHY present and the PHY link was up. Otherwise,
> inband leaves mac_config false, and thus if force_major_config was
> true, that would persist indefinitely.

Ok, I certainly wasn't careful enough when analyzing the existing code path.
If I understand you correctly, you're thinking something like this should be
sufficient to avoid depending on bool mac_config? The diff is
incremental over the change posted here.

-- >8 --
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 3dcd1f47093a..893acab0d9bd 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1712,20 +1712,18 @@ static void phylink_resolve(struct work_struct *w)
if (pl->act_link_an_mode != MLO_AN_FIXED)
phylink_apply_manual_flow(pl, &link_state);

- if (mac_config) {
- if (link_state.interface != pl->link_config.interface ||
- pl->force_major_config) {
- /* The interface has changed, force the link down and
- * then reconfigure.
- */
- if (cur_link_state) {
- phylink_link_down(pl);
- cur_link_state = false;
- }
- phylink_major_config(pl, false, &link_state);
- pl->link_config.interface = link_state.interface;
- pl->force_major_config = false;
+ if ((mac_config && link_state.interface != pl->link_config.interface) ||
+ pl->force_major_config) {
+ /* The interface has changed or a forced major configuration
+ * was requested, so force the link down and then reconfigure.
+ */
+ if (cur_link_state) {
+ phylink_link_down(pl);
+ cur_link_state = false;
}
+ phylink_major_config(pl, false, &link_state);
+ pl->link_config.interface = link_state.interface;
+ pl->force_major_config = false;
}

if (link_state.link != cur_link_state) {
-- >8 --

Not worth it, IMO, to complicate the logic with yet one more layer of
"ifs" for the pl->link_config.interface and pl->force_major_config
reassignment. It's ok if they are assigned their current values, when
the code block is entered on a transition of the other variable.

> > +void phylink_replay_link_begin(struct phylink *pl)
> > +{
> > + ASSERT_RTNL();
> > +
> > + phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
>
> I would prefer this used a different disable flag, so that...
>
> > +}
> > +
> > +void phylink_replay_link_end(struct phylink *pl)
> > +{
> > + ASSERT_RTNL();
> > +
> > + pl->force_major_config = true;
> > + phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
>
> this can check that phylink_replay_link_begin() was previously called
> to catch programming errors. There shouldn't be any conflict with
> phylink_start()/phylink_stop() since the RTNL is held, but I think
> its still worth checking that phylink_replay_link_begin() was
> indeed called previously.

This should be fine, right?
-- >8 --
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 893acab0d9bd..c9fb0bb024d5 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -34,6 +34,7 @@ enum {
PHYLINK_DISABLE_STOPPED,
PHYLINK_DISABLE_LINK,
PHYLINK_DISABLE_MAC_WOL,
+ PHYLINK_DISABLE_REPLAY,

PCS_STATE_DOWN = 0,
PCS_STATE_STARTING,
@@ -4070,7 +4071,7 @@ void phylink_replay_link_begin(struct phylink *pl)
{
ASSERT_RTNL();

- phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
+ phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_REPLAY);
}
EXPORT_SYMBOL_GPL(phylink_replay_link_begin);

@@ -4093,8 +4094,13 @@ void phylink_replay_link_end(struct phylink *pl)
{
ASSERT_RTNL();

+ if (WARN(!test_bit(PHYLINK_DISABLE_REPLAY,
+ &pl->phylink_disable_state),
+ "phylink_replay_link_end() called without a prior phylink_replay_link_begin()\n"))
+ return;
+
pl->force_major_config = true;
- phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
+ phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_REPLAY);
flush_work(&pl->resolve);
}
EXPORT_SYMBOL_GPL(phylink_replay_link_end);
-- >8 --

> Other than those points, I think for sja1105 this is a better approach,
> and as it's lightweight in phylink, I don't think having this will add
> much maintenance burden, so I'm happy with the approach.

Thanks, this is encouraging. I'll continue to work on it, and then clean
up what's left in sja1105, as mentioned in the commit message.

I've retested with both changes, and it still appears to work.