Re: [PATCH net-next v8 3/3] gve: implement PTP gettimex64
From: David Woodhouse
Date: Thu May 21 2026 - 20:13:36 EST
On Fri, 2026-05-22 at 00:43 +0200, Thomas Gleixner wrote:
> On Thu, May 21 2026 at 20:59, David Woodhouse wrote:
> > I didn't spent *that* long trying, but I couldn't see a way of doing it
> > centrally, which didn't end up making me sadder than the initial
> > duplication.
> >
> > Honestly, whether it was intent or omission, I suspect the approach
> > Harshitha took for GVE is the right one: Support TSC, and anyone who
> > uses KVM clock doesn't deserve accurate time anyway.
>
> I agree with the sentiment, but the implementation is just bonkers TBH.
>
> The driver should not even care about TSC/ARM_XX. Drivers have to be
> mostly agnostic of this and we went to a great length for that with the
> existing PTM core support. The core might need extensions for the pre/post
> muck, but that's not a justification for hacks like this GVE stuff.
I haven't looked hard at what GVE is trying to do; only glanced at the
consistency of CSID_X86_TSC vs. CSID_X86_KVM_CLK handling as I was
already sad about that one.
> > <... migration...>
> Anything else is just crystal ball magic, which is what KVM has today....
Yeah, we digress. There's a 30-odd patch series on the kvm list to
clean up guest tsc/kvmclock handling and migration, and document the
process, if you want to come play. But honestly, there aren't enough
drugs in the world for someone to want to get involved in that crap
*voluntarily*. I recommend you don't ;)
> > So surely a device can only support this if it *does* have a precise
> > hardware timestamp, e.g. from ART? And the corresponding sys_realtime
> > and sys_monoraw fields also have to be from the *same* value, surely?
>
> That is correct, but you still fail to explain how this new made up
> att.counter_id/att.counter_value is filled in coherently without the
> driver doing something abysmal?
>
> You can't explain it, because the only way to do that correctly is by
> extending the cross time stamp struct and get_device_system_crosststamp().
Yes. Isn't that what I asked Arthur to do?
> > Is that not how it works? If not.... WTF *is* it for?
>
> Yes, that's the way it works since get_device_system_crosststamp() got
> introduced, but that does not give a random driver access to the actual
> converted (TSC) counter value. The driver can only observe the PTM value
> which is ART on x86 and whatever on other architectures. So it does not
> know anything about it.
Sure, the driver can only return what it *knows*.
We're certainly missing some infrastructure here... maybe the pci_bus
should have a CSID which corresponds to its PTM domain, which on x86
would be CSID_X86_ART?
(I really do want to get my hands on one of those PTM-capable atomic
clock TimeCard things, and have a play with implementing vmclock on
them using ART as the reference)
> > We're allowed to set *fire* to anyone who just throws a random
> > get_cycles() call in there, aren't we? That's basically just self-
> > defence, m'lud.
>
> I agree, but the way how this attr.counter_id/value extension is defined
> either requires unholy hacks to get access to the ART/TSC conversion or
> resort to get_cycles().
Never get_cycles(). Let's perhaps make it *not* need unholy hacks to
interpret a PTM counter value. Otherwise what's the point in ART and
PTM at all?
> > > While this:
> ...
>
> > > + xtstamp->sys_counter = system_counterval;
>
> ...
>
> simple core code change:
>
> > > Provides the information coherently and without any extra hacks. No?
> >
> > Yeah, that looks eminently reasonable. It isn't actually that far off
> > what Arthur did in patch 1 of the series, is it? He was adding other
> > clock quality attributes at the same time, and ended up shoe-horning
> > the cs_id and cycle_count into those instead of using a separate
> > system_counterval_t as you have here, which *is* cleaner. So yes, let's
> > get him to do that, but that part is cosmetic?
>
> No. It's not cosmetic.
>
> This patch #1 just defines new attributes, which include the CS ID and
> the CS counter value with absolutely ZERO explanation where these values
> are supposed to come from and ZERO explanation how they are coherent.
Pfft. Explanations are *definitely* cosmetic. And so is the difference
between a system_counterval_t, and a separate pair of {csid, cycles}
fields that precisely correspond to it; at least if we ignore use_nsecs
(which I have thus far).
I'll stand by calling that part cosmetic.
Not that we shouldn't fix it, but we should try to make our request of
the author clear and concise, and not catastrophise over the tweaks we
need to be made.
> It suggests that the driver can fill them in along with the other truly
> driver related data, e.g. error_bound.
>
> But as I explained before the driver _cannot_ fill them in coherently
> neither for the new gettimexattrs64() nor for the getcrosststampattrs()
> callbacks.
And yet in patch 4/7, vmclock_populate_ptp_attributes() is *given* the
count value from which the time reading is generated, puts it into the
{csid, cycles} fields which we agree should be turned into a
system_counterval_t as part of the device timestamp, and I don't see
the problem.
> So you add an extension to the existing interface without core support,
> which means once that is merged the driver crowd will happily take any
> shortcut to support it.
Please be specific. Aside from the values being a system_counterval_t
in each timestamp (both pre/post and, for those devices which can, the
device time), what is missing?
> Are _you_ going to mop up the resulting mess?
>
> > > Extending get_device_system_crosststamp() for AUX clocks is on my todo
> > > list for a long time, but I did not come around to it yet. It's not
> > > really complicated to make that work.
> > >
> > > The actual ena_phc_gettimexattrs64() implementation does not provide the
> > > counter value at all despite the fact that it could trivially do so with
> > > a modified ktime_get_snapshot() except for AUX clock IDs, but that's a
> > > solvable problem. See uncompiled PoC below.
> >
> > Huh? Unless it has PTM and is converting from ART (or is something
> > virtual like vmclock or the older non-LM-safe KVM hacks), surely the
> > driver has no business pretending to support 'precise' mode, and no
> > business providing a counter value?
>
> Huch?
>
> According to your earlier argumentation about counter values, you want
> the counter value, REALTIME, RAW combo even for gettimexattrs64() for
> the pre/post timestamps in case the hardware does not support PTM, no?
For the pre/post timestamps, sure. Not the *device* timestamp when it
doesn't support that.
> > In this case, we'd want the pre_ts and post_ts to be generated by the
> > common code using ktime_get_snapshot_id(cs_id) so that we have the
> > corresponding system_counterval_t in the 'A' parts of the ABA tuple...
> > but the device part of it can't pull one out of thin air unless it is
> > *part* of the device reading, surely?
>
> Correct, but where does the proposed patch set implement any of this?
It doesn't yet; I only asked for it this evening. Should be in the next
round.
> In the ena driver it sets cs_id and cs_value to 0,
Good. I hadn't looked hard at the ENA driver yet. So I asked Arthur if
he'd tried to expose a cycle value for the ENA PHC and the answer was
essentially "of course not; it doesn't have PTM".
> which will be replaced by abysmal hacks within no time because
> someone figures out that adding these values will give "better"
> results ....
Those were the ones we were allowed to set fire to, right?
> Neither the gettimexattrs64() nor the getcrosststampattrs() have any
> coherent way to fill in the CS related data, but you are pushing for
> adding this infrastructure first before thinking about the
> consequences. You can't be serious about that.
I do not believe I agree. I'm adding the infrastructure to report this
information because we *do* have it, and userspace wants to see it.
Some drivers *do* have it. The vmclock gettimex64 method literally just
calls ktime_get_snapshot(), populates both its pre and post timestamps
with the *same* result, and the device timestamp is calculated from the
same counter value at that precise moment.
It can *absolutely* also return the actual counter value used, if the
infrastructure supports that. As can the ptp_kvm drivers. And, a device
with PTM could report that too (although we do need a little more work
on cleanly converting that to an understandable domain, as you say).
And even for drivers which don't have simultaneous cycle readings,
userspace still wants to get the counter values corresponding to the
pre/post readings, which again can absolutely be supported.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature