Re: [RFC] net: dsa: mv88e6xxx: ptp: improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec

From: Vladimir Oltean
Date: Mon Aug 05 2019 - 05:55:05 EST


Hi Hubert,

On Fri, 2 Aug 2019 at 19:33, Hubert Feurstein <h.feurstein@xxxxxxxxx> wrote:
>
> With this patch the phc2sys synchronisation precision improved to +/-500ns.
>
> Signed-off-by: Hubert Feurstein <h.feurstein@xxxxxxxxx>
> ---
>
> This patch should only be the base for a discussion about improving precision of
> phc2sys (from the linuxptp package) in combination with a mv88e6xxx switch and
> imx6-fec.
>
> When I started my work on PTP at the beginning of this week, I was positively
> supprised about the sync precision of ptp4l. After adding support for the
> MV88E6220 I was able to synchronize two of our boards within +/- 10 nanoseconds.
> Remebering that the PTP system in the MV88E6220 is clocked with 100MHz, this is
> I think the best what can be expected. Big thanks to Richard and the other
> developers who made this possible.
>
> But then I tried to synchornize the PTP clock with the system clock by using
> phc2sys (phc2sys -rr -amq -l6) and I quickly was very disapointed about the
> precision.
>
> Min: -17829 ns
> Max: 21694 ns
> StdDev: 8520 ns
> Count: 127
>
> So I tried to find the reason for this. And as you probably already know, the
> reason for that is the MDIO latency, which is terrible (up to 800 usecs).
>
> As a next step, I tried to to implement the gettimex64 callback (see: "[PATCH]
> net: dsa: mv88e6xxx: extend PTP gettime function to read system clock"). With
> this in place (and a patched linuxptp-master version which really uses the
> PTP_SYS_OFFSET_EXTENDED-ioctl), I got the following results:
>
> Min: -12144 ns
> Max: 10891 ns
> StdDev: 4046,71 ns
> Count: 112
>
> So, things improved, but this is still unacceptable. It was still not possible
> to compensate the MDIO latency issue.
>
> According to my understanding, the timestamps (by using
> ptp_read_system_{pre|post}ts) have to be captured at a place where we have an
> constant offset related to the PHC in the switch. The only point where these
> timestamps can be captured is the mdio_write callback in the imx_fec. Because,
> reading the PHC timestamp will result in the follwing MDIO accesses:
>
> (several) reads of the AVB_CMD register (to poll for the busy-flag)
> write AVB_CMD (AVBOp=110b Read with post-incerement of PHC timestamp)
> read AVB_DATA (PTP Global Time [15:0])
> read AVB_DATA (PTP Global Time [31:16])
>
> With this sequence in mind, the Marvell switch has to snapshot the PHC
> timestamp at the write-AVB_CMD in order to be able to get sane values later by
> reading AVB_DATA. So the best place to capture the system timestamps is this
> one and only write operation directly in the imx_fec. By using the patch below
> (without the changes to the system clock resolution) I got the following
> results:
>
> Min: -464 ns
> Max: 525 ns
> StdDev: 210,31 ns
> Count: 401
>
> I would say that is a huge improvement.
>
> I realized, that the system clock (at least on the imx6) has a resolution of
> 333ns. So I tried to speed up this clock by using the PER-clock instead of
> OSC_PER. This gave me 15ns resolution. The results were:
>
> Min: -476 ns
> Max: 439 ns
> StdDev: 176,52 ns
> Count: 630
>
> So, things got improved again a little bit (at least the StdDev).
>
> According to my understanding, this is almost the best which is possible,
> because there is one more clock which influences the results. This is the MDIO
> bus clock, which is just 2.5MHz (or 400ns). So, I would say that +/- 400ns
> jitter is caused only by the MDIO bus clock, since the changes in imx_fec should
> not introduce any unpredictable latencies.
>
> My question to the experienced kernel developers is, how can this be implemented
> in a more generic way? Because this hack only works under these circumstances,
> and of course can never be part of the mainline kernel.
>
> I am not 100% sure that all my statements or assumptions are correct, so feel
> free to correct me.
>
> Hubert
>

You guessed correctly (since you copied me) that I'm battling much of
the same issues with the sja1105 and its spi-fsl-dspi controller
driver.
In fact I will refrain from saying anything about your
PTP_SYS_OFFSET_EXTENDED solution/hack combo, but will ask some
questions instead:

- You said you patched linuxptp master. Could you share the patch? Is
there anything else that's needed except compiling against the board's
real kernel headers (aka point KBUILD_OUTPUT to the extracted location
of /sys/kernel/kheaders.tar.xz)? I've done that and I do see phc2sys
probing and using the new ioctl, but I don't see a big improvement in
my case (that's probably because my SPI interface has real jitter
caused by peripheral clock instability, but I really need to put a
scope on it to be sure, so that's a discussion for another time).
- I really wonder what your jitter is caused by. Maybe it is just
contention on the mii_bus->mdio_lock? If that's the case, maybe you
don't need to go all the way down to the driver level, and taking the
ptp_sts at the subsystem (MDIO, SPI, I2C, etc) level is "good enough".
- Along the lines of the above, I wonder how badly would the
maintainers shout at the proposal of adding a struct
ptp_system_timestamp pointer and its associated lock in struct device.
That way at least you don't have to change the API, like you did with
mdiobus_write_nested_ptp. Relatively speaking, this is the least
amount of intrusion (although, again, far from beautiful).
- The software timestamps help you in the particular case of phc2sys,
but are they enough to cover all your needs? I haven't spent even 1
second looking at Marvell switch datasheets, but is its free-running
timer only used for PTP timestamping? At least the sja1105 does also
support time-based egress scheduling and ingress policing, so for that
scenario, the timecounter/cyclecounter implementation will no longer
cut it (you do need to synchronize the hardware clock). If your
hardware supports these PTP-based features as well, I can only assume
the reason why mv88e6xxx went for a timecounter is the same as the
reason why I did: the MDIO/SPI jitter while accessing the frequency
and offset correction registers is bad enough that the servo loop goes
haywire. And implementing gettimex64 will not solve that.

I also added Miroslav Lichvar, who originally created the
PTP_SYS_OFFSET_EXTENDED ioctl, maybe he can share some ideas on how it
is best served.

> drivers/clocksource/timer-imx-gpt.c | 9 ++++++++-
> drivers/net/dsa/mv88e6xxx/chip.h | 2 ++
> drivers/net/dsa/mv88e6xxx/ptp.c | 11 +++++++----
> drivers/net/dsa/mv88e6xxx/smi.c | 2 +-
> drivers/net/ethernet/freescale/fec_main.c | 24 ++++++++++++++++++-----
> drivers/net/phy/mdio_bus.c | 16 +++++++++++++++
> include/linux/mdio.h | 2 ++
> include/linux/phy.h | 1 +
> 8 files changed, 56 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c
> index 706c0d0ff56c..84695a2d8ff7 100644
> --- a/drivers/clocksource/timer-imx-gpt.c
> +++ b/drivers/clocksource/timer-imx-gpt.c
> @@ -471,8 +471,15 @@ static int __init mxc_timer_init_dt(struct device_node *np, enum imx_gpt_type t
>
> /* Try osc_per first, and fall back to per otherwise */
> imxtm->clk_per = of_clk_get_by_name(np, "osc_per");
> - if (IS_ERR(imxtm->clk_per))
> +
> + /* Force PER clock to be used, so we get 15ns instead of 333ns */
> + //if (IS_ERR(imxtm->clk_per)) {
> + if (1) {
> imxtm->clk_per = of_clk_get_by_name(np, "per");
> + pr_info("==> Using PER clock\n");
> + } else {
> + pr_info("==> Using OSC_PER clock\n");
> + }
>
> imxtm->type = type;
>
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 01963ee94c50..9e14dc406415 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -277,6 +277,8 @@ struct mv88e6xxx_chip {
> struct ptp_clock_info ptp_clock_info;
> struct delayed_work tai_event_work;
> struct ptp_pin_desc pin_config[MV88E6XXX_MAX_GPIO];
> + struct ptp_system_timestamp *ptp_sts;
> +
> u16 trig_config;
> u16 evcap_config;
> u16 enable_count;
> diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
> index 073cbd0bb91b..cf6e52ee9e0a 100644
> --- a/drivers/net/dsa/mv88e6xxx/ptp.c
> +++ b/drivers/net/dsa/mv88e6xxx/ptp.c
> @@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
> return 0;
> }
>
> -static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
> - struct timespec64 *ts)
> +static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
> + struct timespec64 *ts,
> + struct ptp_system_timestamp *sts)
> {
> struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
> u64 ns;
>
> mv88e6xxx_reg_lock(chip);
> + chip->ptp_sts = sts;
> ns = timecounter_read(&chip->tstamp_tc);
> + chip->ptp_sts = NULL;
> mv88e6xxx_reg_unlock(chip);
>
> *ts = ns_to_timespec64(ns);
> @@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
> struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
> struct timespec64 ts;
>
> - mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
> + mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
>
> schedule_delayed_work(&chip->overflow_work,
> MV88E6XXX_TAI_OVERFLOW_PERIOD);
> @@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
> chip->ptp_clock_info.max_adj = MV88E6XXX_MAX_ADJ_PPB;
> chip->ptp_clock_info.adjfine = mv88e6xxx_ptp_adjfine;
> chip->ptp_clock_info.adjtime = mv88e6xxx_ptp_adjtime;
> - chip->ptp_clock_info.gettime64 = mv88e6xxx_ptp_gettime;
> + chip->ptp_clock_info.gettimex64 = mv88e6xxx_ptp_gettimex;
> chip->ptp_clock_info.settime64 = mv88e6xxx_ptp_settime;
> chip->ptp_clock_info.enable = ptp_ops->ptp_enable;
> chip->ptp_clock_info.verify = ptp_ops->ptp_verify;
> diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
> index 5fc78a063843..801fd4abba5a 100644
> --- a/drivers/net/dsa/mv88e6xxx/smi.c
> +++ b/drivers/net/dsa/mv88e6xxx/smi.c
> @@ -45,7 +45,7 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
> {
> int ret;
>
> - ret = mdiobus_write_nested(chip->bus, dev, reg, data);
> + ret = mdiobus_write_nested_ptp(chip->bus, dev, reg, data, chip->ptp_sts);
> if (ret < 0)
> return ret;
>
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 2f6057e7335d..20f589dc5b8b 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>
> reinit_completion(&fep->mdio_done);
>
> - /* start a write op */
> - writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> - FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> - FEC_MMFR_TA | FEC_MMFR_DATA(value),
> - fep->hwp + FEC_MII_DATA);
> + if (bus->ptp_sts) {
> + unsigned long flags = 0;
> + local_irq_save(flags);
> + __iowmb();
> + /* >Take the timestamp *after* the memory barrier */
> + ptp_read_system_prets(bus->ptp_sts);
> + writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> + ptp_read_system_postts(bus->ptp_sts);
> + local_irq_restore(flags);
> + } else {
> + /* start a write op */
> + writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> + }
>
> /* wait for end of transfer */
> time_left = wait_for_completion_timeout(&fep->mdio_done,
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index bd04fe762056..f076487db11f 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -672,6 +672,22 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val)
> }
> EXPORT_SYMBOL(mdiobus_write_nested);
>
> +int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts)
> +{
> + int err;
> +
> + BUG_ON(in_interrupt());
> +
> + mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
> + bus->ptp_sts = ptp_sts;
> + err = __mdiobus_write(bus, addr, regnum, val);
> + bus->ptp_sts = NULL;
> + mutex_unlock(&bus->mdio_lock);
> +
> + return err;
> +}
> +EXPORT_SYMBOL(mdiobus_write_nested_ptp);
> +
> /**
> * mdiobus_write - Convenience function for writing a given MII mgmt register
> * @bus: the mii_bus struct
> diff --git a/include/linux/mdio.h b/include/linux/mdio.h
> index e8242ad88c81..7f9767babdc3 100644
> --- a/include/linux/mdio.h
> +++ b/include/linux/mdio.h
> @@ -9,6 +9,7 @@
> #include <uapi/linux/mdio.h>
> #include <linux/mod_devicetable.h>
>
> +struct ptp_system_timestamp;
> struct gpio_desc;
> struct mii_bus;
>
> @@ -310,6 +311,7 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
> int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
> int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
> int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
> +int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts);
>
> int mdiobus_register_device(struct mdio_device *mdiodev);
> int mdiobus_unregister_device(struct mdio_device *mdiodev);
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 462b90b73f93..fd4e33654863 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -252,6 +252,7 @@ struct mii_bus {
> int reset_delay_us;
> /* RESET GPIO descriptor pointer */
> struct gpio_desc *reset_gpiod;
> + struct ptp_system_timestamp *ptp_sts;
> };
> #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
>
> --
> 2.22.0
>

Regards,
-Vladimir