Re: [PATCH net-next v2 3/3] gve: implement PTP gettimex64

From: Naman Gulati

Date: Thu Apr 02 2026 - 11:54:21 EST

On Thu, Apr 2, 2026 at 8:38 AM Jordan Rhee <jordanrhee@xxxxxxxxxx> wrote:
>
> We know that gve_adminq_report_nic_ts() will cause a VM exit, which
> will act as a full barrier. Additional barriers are not strictly
> necessary from a functional perspective. Can we rely on this fact or
> do we need explicit barriers?

I don't think that's enough. Without a barrier get_cycles can be
speculatively executed before gve_adminq_report_nic_ts.

>
>
> On Tue, Mar 31, 2026 at 11:05 AM Naman Gulati <namangulati@xxxxxxxxxx> wrote:
> >
> > On Sun, Mar 29, 2026 at 9:31 PM Jordan Rhee <jordanrhee@xxxxxxxxxx> wrote:
> > >
> > > On Sun, Mar 29, 2026 at 2:20 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > > >
> > > > This is an AI-generated review of your patch. The human sending this
> > > > email has considered the AI review valid, or at least plausible.
> > > > ---
> > > > Subject: gve: implement PTP gettimex64
> > > >
> > > > This commit implements the PTP gettimex64 function to enable chrony and
> > > > phc2sys to synchronize the system clock to the NIC clock. It reads the
> > > > NIC timestamp along with pre and post system cycle counters to minimize
> > > > the uncertainty window.
> > > >
> > > > > diff --git a/drivers/net/ethernet/google/gve/gve_ptp.c b/drivers/net/ethernet/google/gve/gve_ptp.c
> > > > > index 140b8fbce4f4..bea794541f30 100644
> > > > > --- a/drivers/net/ethernet/google/gve/gve_ptp.c
> > > > > +++ b/drivers/net/ethernet/google/gve/gve_ptp.c
> > > > > @@ -10,28 +10,203 @@
> > > >
> > > > [ ... ]
> > > >
> > > > > +static int gve_adminq_read_timestamp(struct gve_priv *priv,
> > > > > + cycles_t *pre_cycles,
> > > > > + cycles_t *post_cycles)
> > > > > +{
> > > > > + unsigned long delay_us = 1000;
> > > > > + int retry_count = 0;
> > > > > + int err;
> > > > > +
> > > > > + lockdep_assert_held(&priv->nic_ts_read_lock);
> > > > > +
> > > > > + do {
> > > > > + *pre_cycles = get_cycles();
> > > > > + err = gve_adminq_report_nic_ts(priv, priv->nic_ts_report_bus);
> > > > > +
> > > > > + /* Ensure cycle counter is sampled after AdminQ cmd returns */
> > > > > + rmb();
> > > > > + *post_cycles = get_cycles();
> > > >
> > > > Are execution barriers needed around get_cycles() here? rmb() orders
> > > > memory accesses, but the CPU might still speculatively execute the cycle
> > > > counter reads out of order with respect to the AdminQ commands.
> > > >
> > > > Could this speculative reordering cause the strict bounds validation in
> > > > gve_ptp_gettimex64() to fail spuriously?
> > >
> > > I will add mb() barriers to be totally safe.
> >
> > I'm not so sure that rmb() is insufficient here. In x86_64 rmb() is an
> > lfence. lfence is used to prevent speculative execution (per the x86
> > barrier_nospec implementation) and rdtsc_ordered, which is called by
> > the ktime* helpers, also prefaces rdtsc with an lfence. On ARM the
> > get_cycles implementation adds an isb instruction before reading
> > cntvct_el0.
> >
> >
> > >
> > > > [ ... ]
> > > >
> > > > > static int gve_ptp_gettimex64(struct ptp_clock_info *info,
> > > > > struct timespec64 *ts,
> > > > > struct ptp_system_timestamp *sts)
> > > > > {
> > > > > - return -EOPNOTSUPP;
> > > > > + struct gve_ptp *ptp = container_of(info, struct gve_ptp, info);
> > > > > + struct gve_sysclock_sample sysclock = {0};
> > > > > + struct gve_priv *priv = ptp->priv;
> > > > > + struct system_time_snapshot snap;
> > > > > + u64 nic_ts;
> > > > > + int err;
> > > > > +
> > > > > + /* Take system clock snapshot before sampling cycle counters */
> > > > > + if (sts)
> > > > > + ktime_get_snapshot(&snap);
> > > > > +
> > > > > + err = gve_clock_nic_ts_read(priv, &nic_ts, &sysclock);
> > > >
> > > > Could the system time snapshot become stale here?
> > > >
> > > > If priv->nic_ts_read_lock is contended, or if gve_adminq_read_timestamp()
> > > > retries and sleeps, the snapshot might be taken significantly before the
> > > > actual hardware read.
> > > >
> > > > Would it be more accurate to take the snapshot immediately before reading the
> > > > hardware, inside the mutex?
> > >
> > > Ack, will fix.
> > >
> > > >
> > > > > + if (err)
> > > > > + return err;
> > > > > +
> > > > > + if (sts) {
> > > > > + /* Reject samples with out of order system clock values */
> > > > > + if (!(sysclock.host_pre_cycles <= sysclock.nic_pre_cycles &&
> > > > > + sysclock.nic_pre_cycles <= sysclock.nic_post_cycles &&
> > > > > + sysclock.nic_post_cycles <= sysclock.host_post_cycles)) {
> > > > > + dev_err_ratelimited(&priv->pdev->dev,
> > > > > + "AdminQ system clock cycle counts out of order. Expecting %llu <= %llu <= %llu <= %llu\n",
> > > > > + (u64)sysclock.host_pre_cycles,
> > > > > + sysclock.nic_pre_cycles,
> > > > > + sysclock.nic_post_cycles,
> > > > > + (u64)sysclock.host_post_cycles);
> > > > > + return -EBADMSG;
> > > >
> > > > How does this behave on architectures where get_cycles() is not implemented
> > > > and returns 0, or with older firmware that leaves nic_pre_cycles and
> > > > nic_post_cycles as 0?
> > >
> > > Firmware is not allowed to return 0. 0 should trigger the check.
> > >
> > > > It looks like the bounds check would unconditionally fail and return -EBADMSG.
> > > > Since userspace tools like chrony usually expect -EOPNOTSUPP for unsupported
> > > > extended ioctls to fall back to basic PTP ioctls, will returning -EBADMSG
> > > > cause a hard failure for clock synchronization?
> > >
> > > Will fix. GVE is only supported on x86_64 and arm64, and will fail on
> > > other architectures with -EOPNOTSUPP due to the check in
> > > gve_cycles_to_clock_fn(). However, if an architecture returns
> > > get_cycles() as 0, it will fail here first with -EBADMSG. I will add
> > > an explicit check for x86 || arm64 earlier in this function.
> > >
> > > >
> > > > [ ... ]
> > > >
> > > > > @@ -132,7 +307,7 @@ int gve_init_clock(struct gve_priv *priv)
> > > > > goto release_ptp;
> > > > > }
> > > > > mutex_init(&priv->nic_ts_read_lock);
> > > > > - err = gve_clock_nic_ts_read(priv, &nic_raw);
> > > > > + err = gve_clock_nic_ts_read(priv, &nic_raw, NULL);
> > > > > if (err) {
> > > > > dev_err(&priv->pdev->dev, "failed to read NIC clock %d\n", err);
> > > > > goto release_nic_ts_report;
> > > >
> > > > Is there a race condition exposing the PTP device to userspace before its
> > > > resources are fully initialized?
> > > >
> > > > Looking at gve_init_clock(), gve_ptp_init() is called before allocating
> > > > nic_ts_report and initializing nic_ts_read_lock:
> > > >
> > > > err = gve_ptp_init(priv);
> > > > if (err)
> > > > return err;
> > > >
> > > > priv->nic_ts_report = dma_alloc_coherent(...);
> > > > ...
> > > > mutex_init(&priv->nic_ts_read_lock);
> > > >
> > > > If a concurrent userspace process immediately invokes the
> > > > PTP_SYS_OFFSET_EXTENDED ioctl after gve_ptp_init() registers the /dev/ptpX
> > > > device, could it call gve_ptp_gettimex64() and attempt to lock the
> > > > uninitialized mutex or dereference the NULL nic_ts_report pointer?
> > > >
> > > > Additionally, in the error path for gve_init_clock():
> > > >
> > > > release_nic_ts_report:
> > > > mutex_destroy(&priv->nic_ts_read_lock);
> > > > dma_free_coherent(...);
> > > > priv->nic_ts_report = NULL;
> > > > release_ptp:
> > > > gve_ptp_release(priv);
> > > >
> > > > Could destroying the mutex and freeing the memory before gve_ptp_release()
> > > > create a use-after-free window if an ioctl is currently running?
> > >
> > > Will be fixed in the previous patch in the series.