Re: [PATCH net-next v2 2/2] net: phy: Add Marvell PHY PTP support
From: Kory Maincent
Date: Thu Apr 10 2025 - 05:24:52 EST
On Wed, 9 Apr 2025 23:38:00 +0100
"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> wrote:
> On Wed, Apr 09, 2025 at 06:34:35PM +0100, Russell King (Oracle) wrote:
> > On Wed, Apr 09, 2025 at 06:04:14PM +0200, Kory Maincent wrote:
> > > On Wed, 9 Apr 2025 14:35:17 +0100
> > > "Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> wrote:
> > >
> [...]
> [...]
> [...]
> > >
> > > So you are only testing the mvpp2 PTP. It seems there is something broken
> > > with it. I don't think it is related to my work.
> >
> > Yes, and it has worked - but probably was never tested with PTPDv2 but
> > with linuxptp. As it was more than five years ago when I worked on this
> > stuff, I just can't remember the full details of the test setup I used.
> >
> > I think the reason I gave up running PTP on my network is the problems
> > that having the NIC bound into a Linux bridge essentially means that
> > you can't participate in PTP on that machine. That basically means a
> > VM host machine using a bridge device for the guests can't use PTP
> > to time sync itself.
> >
> > Well, it looks like the PHY based timestamping also isn't working -
> > ptp4l says its failing to timestamp transmitted packets, but having
> > added debug, the driver _is_ timestamping them, so the timestamps
> > are getting lost somewhere in the networking layer, or are too late
> > for ptp4l, which only waits 1ms, and the schedule_delayed_work(, 2)
> > will be about 20ms at HZ=100. Increasing the wait in ptp4l to 100ms
> > still doesn't appear to get a timestamp. According to the timestamps
> > on the debug messages, it's only taking 10ms to return the timestamp.
> >
> > So, at the moment, ptp looks entirely non-functional. Or the userspace
> > tools are broken.
>
> Right, got to the bottom of it at last. I hate linuxptp / ptp4l. The
> idea that one looks at the source, sees this:
>
> res = poll(&pfd, 1, sk_tx_timeout);
> if (res < 1) {
> pr_err(res ? "poll for tx timestamp failed: %m" :
> "timed out while polling for tx
> timestamp"); pr_err("increasing tx_timestamp_timeout may correct "
> "this issue, but it is likely caused by a
> driver bug");
>
> finds this in the same file:
>
> int sk_tx_timeout = 1;
>
> So it seemed obvious and logical that increasing that initialiser would
> increase the _default_ timeout... but no, that's not the case, because,
> ptp4l.c does:
>
> sk_tx_timeout = config_get_int(cfg, NULL, "tx_timestamp_timeout");
>
> unconditionally, and config.c has a table of config options along with
> their defaults... meaning that initialiser above for sk_tx_timeout
> means absolutely nothing, and one _has_ to use a config file.
>
> With that fixed, ptp4l's output looks very similar to that with mvpp2 -
> which doesn't inspire much confidence that the ptp stack is operating
> properly with the offset and frequency varying all over the place, and
> the "delay timeout" messages spamming frequently. I'm also getting
> ptp4l going into fault mode - so PHY PTP is proving to be way more
> unreliable than mvpp2 PTP. :(
That's really weird. On my board the Marvell PHY PTP is more reliable than MACB.
Even by disabling the interrupt.
What is the state of the driver you are using?
On my board using the PHY PTP:
# ethtool --set-hwtimestamp-cfg eth0 index 0 qualifier precise
# ptp4l -m -i eth0 -s -2
ptp4l[1111.786]: selected /dev/ptp0 as PTP clock
ptp4l[1111.829]: port 1 (eth0): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[1111.830]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[1111.831]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[1117.896]: selected local clock 2ace59.fffe.e52d8b as best master
ptp4l[1159.456]: port 1 (eth0): new foreign master 0080e1.fffe.4253d0-1
ptp4l[1163.456]: selected best master clock 0080e1.fffe.4253d0
ptp4l[1163.456]: port 1 (eth0): LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[1165.456]: master offset 797590212440464546 s0 freq -0 path delay 1368
ptp4l[1166.456]: master offset 797590212440460565 s1 freq -3981 path delay 1353
ptp4l[1167.456]: master offset -24 s2 freq -4005 path delay 1353
ptp4l[1167.456]: port 1 (eth0): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[1168.456]: master offset -19 s2 freq -4007 path delay 1353
ptp4l[1169.457]: master offset 821 s2 freq -3173 path delay 528
ptp4l[1170.457]: master offset 2 s2 freq -3745 path delay 528
ptp4l[1171.457]: master offset -295 s2 freq -4042 path delay 578
ptp4l[1172.457]: master offset -262 s2 freq -4097 path delay 578
ptp4l[1173.457]: master offset -147 s2 freq -4061 path delay 553
ptp4l[1174.457]: master offset -42 s2 freq -4000 path delay 525
ptp4l[1175.457]: master offset -38 s2 freq -4008 path delay 525
ptp4l[1176.457]: master offset -10 s2 freq -3992 path delay 522
ptp4l[1177.457]: master offset -29 s2 freq -4014 path delay 525
ptp4l[1178.457]: master offset -23 s2 freq -4017 path delay 525
ptp4l[1179.457]: master offset 0 s2 freq -4000 path delay 522
ptp4l[1180.458]: master offset 4 s2 freq -3996 path delay 523
ptp4l[1181.458]: master offset -8 s2 freq -4007 path delay 523
ptp4l[1182.458]: master offset 10 s2 freq -3992 path delay 521
ptp4l[1183.458]: master offset 9 s2 freq -3990 path delay 521
ptp4l[1184.458]: master offset -7 s2 freq -4003 path delay 523
ptp4l[1185.458]: master offset 4 s2 freq -3994 path delay 523
ptp4l[1186.458]: master offset -15 s2 freq -4012 path delay 525
> Now, the one thing I can't get rid of is the receive timestamp
> overflow warning - this occurs whenever e.g. ptp4l is restarted,
> and is caused by there being no notification that PTP isn't being
> used anymore.
>
> Consequently, we end up with the PHY queuing a timestamp for a Sync
> packet which it sees on the network, but because nothing is wanting
> the packets (because e.g. ptp4l has been stopped) there's no packets
> queued into the receive queue to take this timestamp, so we stop
> polling the PHY for timestamps.
>
> If we continue to rapidly poll the PHY, then we could needlessly
> waste cycles - because nothing tells us "we have no one wanting
> hardware timestamps anymore" which seems to be a glaring hole in
> the PTP design.
Maybe ptp4l should disable the tx timestamp mode and the rx filter mode if it
stops.
> Not setting DISTSOVERWRITE seems like a solution, but that seems to
> lead to issues with timestamps being lost.
>
> Well, having spent much of the afternoon and all evening on this,
> and all I see are problems that don't seem to have solutions.
Maxime is also working on a Macchiatobin, I will ask him to test on his side.
Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com