Re: [PATCH net-next v6 3/3] gve: implement PTP gettimex64
From: Jordan Rhee
Date: Fri May 08 2026 - 13:46:24 EST
On Thu, May 7, 2026 at 2:23 PM Jacob Keller <jacob.e.keller@xxxxxxxxx> wrote:
>
> On 5/7/2026 2:13 PM, Harshitha Ramamurthy wrote:
> > From: Jordan Rhee <jordanrhee@xxxxxxxxxx>
> >
> > Enable chrony and phc2sys to synchronize system clock to NIC clock.
> >
> > Two paths are implemented: a precise path using system counter values
> > sampled by the device, and a fallback path using system counter values
> > sampled in the driver using ptp_read_system_prets()/postts().
> >
> > To use the precise path, the current system clocksource must match the
> > units returned by the device, which on x86 is X86_TSC and on ARM64 is
> > ARM_ARCH_COUNTER. The clockid requested for the cross-timestamp must
> > be either CLOCK_REALTIME or CLOCK_MONOTONIC_RAW. These conditions hold
> > by default on GCP VMs using Chrony, so we expect the precise path to be
> > used the vast majority of the time. If the system clocksource is changed
> > to kvm-clock, it activates the fallback path. Ethtool counters have been
> > added to count how many times each path is used.
> >
> > The uncertainty window in the precise path is typically around 1-2us,
> > while in the fallback path is around 60-80us.
> >
> > Stub implementions of adjfine and adjtime are added to avoid NULL
> > dereference when phc2sys tries to adjust the clock.
> >
> > Cc: John Stultz <jstultz@xxxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxx>
> > Cc: Stephen Boyd <sboyd@xxxxxxxxxx>
> > Cc: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
> > Reviewed-by: Willem de Bruijn <willemb@xxxxxxxxxx>
> > Reviewed-by: Kevin Yang <yyd@xxxxxxxxxx>
> > Reviewed-by: Naman Gulati <namangulati@xxxxxxxxxx>
> > Signed-off-by: Jordan Rhee <jordanrhee@xxxxxxxxxx>
> > Signed-off-by: Harshitha Ramamurthy <hramamurthy@xxxxxxxxxx>
> > ---
> > Changes in v6:
> > - Added a fallback to driver-sampled time sandwich that is used when
> > the following conditions are not met:
> > - The system clock source is X86_TSC or ARM_ARCH_COUNTER
> > - The requested clockid is CLOCK_REALTIME or CLOCK_MONOTONIC_RAW
> > - The architecture is x86 or ARM64
> > - Added ethtool statistics to count how many cross-timestamps used the
> > precise path versus fallback path.
> > - Fixed printf format specifier.
> > - Added stub implementions of adjtime and adjfine to prevent NULL
> > dereference when phc2sys tries to adjust clock.
>
> All excellent improvements.
Thank you for the review!
>
> > - Moved system time snapshot back to gve_ptp_gettimex64() so we can get the
> > current system clock source from it. It is OK for it to not be inside
> > the mutex or retry loop because lock contention and retries should be
> > extremely rare, and chrony filters out bad samples.
> >
>
> I'm a bit worried about this part, but if I understand, it would only
> affect the fallback variant anyways since the clock samples are taken in
> the host in the precise/fast path.
Good point, the snapshot *is* used in the precise path to interpolate
system timestamps that occur before the current timekeeping interval.
This is my understanding of how historical timestamp interpolation
works (adjust_historical_crosststamp()).
<-- total_history_cycles -->
<-- partial_history_cycles -->
-------|------------------+------------------------------------|----------------+---------->
cycles (input)
snapshot sample cur_interval now
-------|------------------+------------------------------------|----------------+---------->
ktime (output)
<-- corr_raw -->
At the start of each timekeeping interval, both the current cycle
count and ktime are recorded. Each interval boundary consists of a
(cycle count, ktime) pair.
It computes the ktime corresponding to sampled cycle count as corr_raw
* partial_history_cycles / total_history_cycles. In other words, it
scales the distance between the interval boundaries in ktime units by
the relative position of the sample in the interval computed using the
cycle counts. It adds or subtracts this value to whichever interval
boundary is closer to get the absolute ktime.
How would this be affected by a very old snapshot? The interpolation
would still work, but it may be less accurate if the clock is not
stable. To get a sense of the timescales,
- the timekeeping interval is typically 4ms (CONFIG_HZ=250).
- the AQ command typically takes 60-80us.
- there are 2 sources of contention on the mutex (aux task and gettimex64)
- worst case retry loop is 100ms (but this should not happen due to
how the firmware rate limiter has been tuned)
- typical clock stability for the relevant cloud server platforms is
about 10ppm
The vast majority of the time, the sample should fall in the current
or previous timekeeping interval, resulting in the most accurate
interpolation possible.
In the worst case for retries and contention, the snapshot could be up
to 200ms before the current timekeeping interval. Assuming clock
stability of 10ppm, this translates to 10us of drift per 1s, or 2us
per 200ms. This is still considerably better than the root dispersion
in the fallback path (81us), so I think it's OK.
>
> > Changes in v5:
> > - Reformulate retry loop in terms of total timeout (Jakub Kicinski)
> >
> > Changes in v3:
> > - Take system time snapshot inside the mutex
> > - Return -EOPNOTSUPP if cross-timestamp is requested on an arch other
> > than x86 or arm64
> >
> > Changes in v2:
> > - fix compilation warning on ARM by casting cycles_t to u64
> > ---
> > drivers/net/ethernet/google/gve/gve.h | 8 +
> > drivers/net/ethernet/google/gve/gve_adminq.h | 4 +-
> > drivers/net/ethernet/google/gve/gve_ethtool.c | 3 +
> > drivers/net/ethernet/google/gve/gve_ptp.c | 237 +++++++++++++++++-
> > 4 files changed, 243 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> > index 7b69d0cfc0d5..4de3ce60060e 100644
> > --- a/drivers/net/ethernet/google/gve/gve.h
> > +++ b/drivers/net/ethernet/google/gve/gve.h
> > @@ -880,6 +880,14 @@ struct gve_priv {
> > u32 stats_report_trigger_cnt; /* count of device-requested stats-reports since last reset */
> > u32 suspend_cnt; /* count of times suspended */
> > u32 resume_cnt; /* count of times resumed */
> > + /* count of cross-timestamps attempted using system timestamps
> > + * from the AQ command
> > + */
> > + u32 ptp_precise_xtstamps;
> > + /* count of cross-timestamps attempted using system timestamps sampled
> > + * by the driver
> > + */
>
> Helpful for analyzing when its not working. Nice!
>
> > +static int gve_cycles_to_clock_fn(ktime_t *device_time,
> > + struct system_counterval_t *system_counterval,
> > + void *ctx)
> > +{
> > + struct gve_cycles_to_clock_callback_ctx *context = ctx;
> > +
> > + *device_time = 0;
> > +
> > + system_counterval->cycles = context->cycles;
> > + system_counterval->use_nsecs = false;
> > +
> > + if (IS_ENABLED(CONFIG_X86))
> > + system_counterval->cs_id = CSID_X86_TSC;
> > + else if (IS_ENABLED(CONFIG_ARM64))
> > + system_counterval->cs_id = CSID_ARM_ARCH_COUNTER;
> > + else
>
> No single kernel can be compiled for both CONFIG_X86 *and* CONFIG_ARM64
> simultaneously, so there is no need to check in sequence and an
> exclusive if/else path is fine. Ok.
Noted, thanks.
>
> > +
> > +static bool
> > +gve_can_use_system_ts_from_device(enum clocksource_ids system_clock_source,
> > + clockid_t clockid)
> > +{
> > + if (clockid != CLOCK_REALTIME && clockid != CLOCK_MONOTONIC_RAW)
> > + return false;
> > +
> > + /* If the system clock source matches the system clock
> > + * returned by the AdminQ command, we can use the system
> > + * timestamps returned by the device, otherwise we have to
> > + * fall back to sampling system time from the host which
> > + * is less accurate.
> > + */
> > + if (IS_ENABLED(CONFIG_X86))
> > + return system_clock_source == CSID_X86_TSC;
> > + else if (IS_ENABLED(CONFIG_ARM64))
> > + return system_clock_source == CSID_ARM_ARCH_COUNTER;
> > +
>
> This feels a bit duplicated to me, since you have the same check
> elsewhere. Would it make sense to pull this to its own function check?
> Its short enough I think that is only warranted if you have a true
> (non-nit) cause to make a v7.
There is a similar check in gve_cycles_to_clock_fn, but the logic
inside the if blocks is different so I'm not sure how they could share
code.
>
> Thus:
>
> Reviewed-by: Jacob Keller <jacob.e.keller@xxxxxxxxx>
Thank you very much, appreciate it.