Re: [PATCH net-next v2 3/3] gve: implement PTP gettimex64

From: Naman Gulati

Date: Tue Mar 31 2026 - 14:19:42 EST

On Sun, Mar 29, 2026 at 9:31 PM Jordan Rhee <jordanrhee@xxxxxxxxxx> wrote:
>
> On Sun, Mar 29, 2026 at 2:20 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > This is an AI-generated review of your patch. The human sending this
> > email has considered the AI review valid, or at least plausible.
> > ---
> > Subject: gve: implement PTP gettimex64
> >
> > This commit implements the PTP gettimex64 function to enable chrony and
> > phc2sys to synchronize the system clock to the NIC clock. It reads the
> > NIC timestamp along with pre and post system cycle counters to minimize
> > the uncertainty window.
> >
> > > diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
> > > index 140b8fbce4f4..bea794541f30 100644
> > > --- a/drivers/net/ethernet/google/gve/gve_ptp.c
> > > +++ b/drivers/net/ethernet/google/gve/gve_ptp.c
> > > @@ -10,28 +10,203 @@
> >
> > [ ... ]
> >
> > > +static int gve_adminq_read_timestamp(struct gve_priv *priv,
> > > + cycles_t *pre_cycles,
> > > + cycles_t *post_cycles)
> > > +{
> > > + unsigned long delay_us = 1000;
> > > + int retry_count = 0;
> > > + int err;
> > > +
> > > + lockdep_assert_held(&priv->nic_ts_read_lock);
> > > +
> > > + do {
> > > + *pre_cycles = get_cycles();
> > > + err = gve_adminq_report_nic_ts(priv, priv->nic_ts_report_bus);
> > > +
> > > + /* Ensure cycle counter is sampled after AdminQ cmd returns */
> > > + rmb();
> > > + *post_cycles = get_cycles();
> >
> > Are execution barriers needed around get_cycles() here? rmb() orders
> > memory accesses, but the CPU might still speculatively execute the cycle
> > counter reads out of order with respect to the AdminQ commands.
> >
> > Could this speculative reordering cause the strict bounds validation in
> > gve_ptp_gettimex64() to fail spuriously?
>
> I will add mb() barriers to be totally safe.

I'm not so sure that rmb() is insufficient here. In x86_64 rmb() is an
lfence. lfence is used to prevent speculative execution (per the x86
barrier_nospec implementation) and rdtsc_ordered, which is called by
the ktime* helpers, also prefaces rdtsc with an lfence. On ARM the
get_cycles implementation adds an isb instruction before reading
cntvct_el0.

>
> > [ ... ]
> >
> > > static int gve_ptp_gettimex64(struct ptp_clock_info *info,
> > > struct timespec64 *ts,
> > > struct ptp_system_timestamp *sts)
> > > {
> > > - return -EOPNOTSUPP;
> > > + struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
> > > + struct gve_sysclock_sample sysclock = {0};
> > > + struct gve_priv *priv = ptp->priv;
> > > + struct system_time_snapshot snap;
> > > + u64 nic_ts;
> > > + int err;
> > > +
> > > + /* Take system clock snapshot before sampling cycle counters */
> > > + if (sts)
> > > + ktime_get_snapshot(&snap);
> > > +
> > > + err = gve_clock_nic_ts_read(priv, &nic_ts, &sysclock);
> >
> > Could the system time snapshot become stale here?
> >
> > If priv->nic_ts_read_lock is contended, or if gve_adminq_read_timestamp()
> > retries and sleeps, the snapshot might be taken significantly before the
> > actual hardware read.
> >
> > Would it be more accurate to take the snapshot immediately before reading the
> > hardware, inside the mutex?
>
> Ack, will fix.
>
> >
> > > + if (err)
> > > + return err;
> > > +
> > > + if (sts) {
> > > + /* Reject samples with out of order system clock values */
> > > + if (!(sysclock.host_pre_cycles <= sysclock.nic_pre_cycles &&
> > > + sysclock.nic_pre_cycles <= sysclock.nic_post_cycles &&
> > > + sysclock.nic_post_cycles <= sysclock.host_post_cycles)) {
> > > + dev_err_ratelimited(&priv->pdev->dev,
> > > + "AdminQ system clock cycle counts out of order. Expecting %llu <= %llu <= %llu <= %llu\n",
> > > + (u64)sysclock.host_pre_cycles,
> > > + sysclock.nic_pre_cycles,
> > > + sysclock.nic_post_cycles,
> > > + (u64)sysclock.host_post_cycles);
> > > + return -EBADMSG;
> >
> > How does this behave on architectures where get_cycles() is not implemented
> > and returns 0, or with older firmware that leaves nic_pre_cycles and
> > nic_post_cycles as 0?
>
> Firmware is not allowed to return 0. 0 should trigger the check.
>
> > It looks like the bounds check would unconditionally fail and return -EBADMSG.
> > Since userspace tools like chrony usually expect -EOPNOTSUPP for unsupported
> > extended ioctls to fall back to basic PTP ioctls, will returning -EBADMSG
> > cause a hard failure for clock synchronization?
>
> Will fix. GVE is only supported on x86_64 and arm64, and will fail on
> other architectures with -EOPNOTSUPP due to the check in
> gve_cycles_to_clock_fn(). However, if an architecture returns
> get_cycles() as 0, it will fail here first with -EBADMSG. I will add
> an explicit check for x86 || arm64 earlier in this function.
>
> >
> > [ ... ]
> >
> > > @@ -132,7 +307,7 @@ int gve_init_clock(struct gve_priv *priv)
> > > goto release_ptp;
> > > }
> > > mutex_init(&priv->nic_ts_read_lock);
> > > - err = gve_clock_nic_ts_read(priv, &nic_raw);
> > > + err = gve_clock_nic_ts_read(priv, &nic_raw, NULL);
> > > if (err) {
> > > dev_err(&priv->pdev->dev, "failed to read NIC clock %d\n", err);
> > > goto release_nic_ts_report;
> >
> > Is there a race condition exposing the PTP device to userspace before its
> > resources are fully initialized?
> >
> > Looking at gve_init_clock(), gve_ptp_init() is called before allocating
> > nic_ts_report and initializing nic_ts_read_lock:
> >
> > err = gve_ptp_init(priv);
> > if (err)
> > return err;
> >
> > priv->nic_ts_report = dma_alloc_coherent(...);
> > ...
> > mutex_init(&priv->nic_ts_read_lock);
> >
> > If a concurrent userspace process immediately invokes the
> > PTP_SYS_OFFSET_EXTENDED ioctl after gve_ptp_init() registers the /dev/ptpX
> > device, could it call gve_ptp_gettimex64() and attempt to lock the
> > uninitialized mutex or dereference the NULL nic_ts_report pointer?
> >
> > Additionally, in the error path for gve_init_clock():
> >
> > release_nic_ts_report:
> > mutex_destroy(&priv->nic_ts_read_lock);
> > dma_free_coherent(...);
> > priv->nic_ts_report = NULL;
> > release_ptp:
> > gve_ptp_release(priv);
> >
> > Could destroying the mutex and freeing the memory before gve_ptp_release()
> > create a use-after-free window if an ioctl is currently running?
>
> Will be fixed in the previous patch in the series.