Re: [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock
From: Arthur Kiyanovski
Date: Mon May 25 2026 - 04:07:19 EST
On 2026-05-24 14:36:35+02:00, Thomas Gleixner wrote:
> On Fri, May 22 2026 at 17:23, David Woodhouse wrote:
>
> > On Fri, 2026-05-22 at 17:28 +0200, Thomas Gleixner wrote:
> >
> > Taking those in reverse order... yes, this means that with a new
> > variant of PTP_SYS_OFFSET_EXTENDED, userspace can see actual counter
> > values even for the system parts of those ABA timestamps, even for non-
> > PTM clocks, and discipline the TSC/archcounter against the external
> > clock.
>
> Correct.
>
> > Currently I have userspace which literally does rdtsc() either side of
> > calling the ioctl :)
>
> Why am I not surprised? :)
>
> > And PTM devices can be used with PTP_SYS_OFFSET_PRECISE, which goes
> > through get_device_system_crosststamp() as described, and all just
> > works? It's just that we now allow userspace to *see* the counter value
> > that the driver was already generating.
>
> A new variant of PRECISE
>
> > So to your questions: although there's new userspace ioctl support, the
> > *drivers* don't need any modification for that (as long as they use the
> > standard prets/postts helpers).
>
> Yes.
>
> > The remaining question is the device timestamp part (the 'B' in the ABA
> > sandwich) for PTP_SYS_OFFSET_EXTENDED with PTM-capable drivers. Should
> > that get a counterval?
>
> PTM-capable driver support cross timestamps, which will with a new
> version of PTP_SYS_OFFSET_PRECISE expose the system counterval. No ABA
> for that as it's hardware latched AB.
>
> > I don't have a strong opinion. On one hand we'd have to find a way to
> > convert it from PTM for devices where it actually *is* PTM, and that's
> > what PTP_SYS_OFFSET_PRECISE is *for*.
>
> Correct.
>
> > But on the other hand, can't the conversion be a whole lot simpler than
> > get_device_system_crosststamp() because it's not actually dealing with
> > any timekeepers; it's basically only invoking convert_base_to_cs()?
>
> But what for? If you have PTM, use PRECISE. There is _zero_ value of
> having pre/post timestamps when the hardware already does the correlated
> precise sampling, no?
>
> > And the ioctl should support it *all* but just have a clear way of
> > indicating that any of the optional fields including the attrs are
> > *not* populated (or use 0/max values maybe?).
>
> Yes.
>
> > So no, I don't think any driver *has* to add any attr support in order
> > to expose counter values to userspace. The only reason I asked Arthur
> > to mix those things up was for the *userspace* API, to avoid adding yet
> > another ioctl over and over again. And now I feel bad for doing so :)
>
> I think you can create _one_ data structure variant, which fits both
> EXTENDED_ATTR and PRECISE_ATTR:
>
> struct attrs {
> u32 valid;
> u32 error_bound;
> ....
> u32 reserved[N];
> };
>
> @valid tells user space, which of the attributes has been filled in by
> the driver. That avoids bounds based validity checks, which are a pain
> as you might end up with different bounds for every attribute. Having a
> valid flags field avoids that completely.
>
> struct devtime {
> ptp_clock_time device_time;
> struct attrs attrs;
> };
>
> struct systime {
> u64 sys_systime;
> u64 sys_rawtime;
> u64 sys_counter;
> u32 sys_counter_id;
> u32 reserved;
> };
>
> Exposing @sys_counter_id requires to expose CSID_* in the user space ABI
> reliably, as otherwise a kernel internal CSID enum change would blow up
> the user space guess work. Your ptp_counter_id approach is error prone.
>
> struct timestamp {
> union {
> struct systime systime;
> struct systime pre_systime;
> };
> struct devtime devtime;
> struct systime post_systime;
> };
>
> struct request {
> u32 valid;
> clockid_t clock_id;
> unsigned int num_samples;
> u32 reserved[N];
> };
>
> I rather have @valid here too. The 'zero the reserved' members approach
> is a pain as new kernels have to map 0 to default behavior instead of
> being free to make 0 mean what they intend. @valid allows you to use
> other sizes than u32 for future fields. All you have to take care of is
> to keep the existing fields at the same place as before.
>
> struct ioctl_data {
> struct request request;
> struct timestamp timestamps[];
> };
>
> So for both PTP_SYS_OFFSET_EXTENDED_ATTRS and
> PTP_SYS_OFFSET_PRECISE_ATTRS user space allocates enough space to
> accomodate data::request::num_samples.
>
> For PTP_SYS_OFFSET_PRECISE_ATTRS num_samples has to be 1 and
> data::timestamps[0].post_systime is zeroed by the kernel because it has
> no meaning.
>
> So now in the kernel you do:
>
> ptp_sys_offset_extended_attrs(struct ptp_clock *ptp, void __user *argptr)
> {
> struct ioctl_data __user *data = argptr;
> struct request;
>
> if (copy_from_user(&request, &data->request, sizeof(request)))
> return -EFAULT;
>
> if (!extattr_request_valid(request))
> return -EINVAL;
>
> for (unsigned int i; i < request.num_samples; i++) {
> struct ptp_system_timestamp sts = { .clock_id = request.clock_id, };
> struct timestamp uts = { };
> struct timespec64 devts;
>
> if (ptp->info->gettimex64_attr)
> ret = ptp->info->gettimex64_attr(ptp->info, &dev_ts, &sts, &uts.attr);
> else if (ptp->info->gettimex64)
> ret = ptp->info->gettimex64(ptp->info, &dev_ts, &sts);
> else
> return -ENOTSUPP;
>
> if (ret)
> return ret;
>
> uts.pre_systime = mangle(sts.pre_systime);
> uts.devtime.device_time = mangle(dev_ts);
> uts.post_systime = mangle(sts.post_systime);
> if (!copy_to_user(&data->timestamps[i], uts, sizeof(uts)))
> return -EFAULT;
> }
> return 0;
> }
>
> ptp_sys_offset_precise_attrs(struct ptp_clock *ptp, void __user *argptr)
> {
> struct ioctl_data __user *data = argptr;
> struct request;
>
> if (copy_from_user(&request, &data->request, sizeof(request)))
> return -EFAULT;
>
> if (!preciseattr_request_valid(request))
> return -EINVAL;
>
> struct system_device_crosststamp xtstamp = { .clock_id = request.clock_id, };
> struct timestamp uts = { };
>
> if (ptp->info->getcrosststamp_attr)
> ret = ptp->info->getcrosststamp_attr(ptp->info, &xtstamp, &uts.attr);
> else if (ptp->info->getcrosststamp)
> ret = ptp->info->getcrosststamp(ptp->info, &xtstamp);
> else
> return -ENOTSUPP;
>
> if (ret)
> return ret;
>
> uts.systime = mangle(xtstamp.systime);
> uts.devtime.device_time = mangle(xtstamp.device);
> if (!copy_to_user(&data->timestamps[0], uts, sizeof(uts)))
> return -EFAULT;
> return 0;
> }
>
> Or something like this, which immediately enables the functionality for
> all drivers which implement the getcrosststamp() or the gettimex64()
> callbacks with a unified user space data structure.
>
> The attributes.valid bits are all zero and and once drivers implement
> the _attr callback variants, those attributes supported by the driver
> will magically appear with the corresponding valid bits set.
>
> Hmm?
>
> Thanks,
>
> tglx
Hi Thomas, David,
Thanks for the layout proposal, Thomas. The unified structure with
explicit valid flags is a much cleaner approach than bounds-based
validation.
I'm the author of the PHC timestamp attributes series [1] that this
applies to. Before I spin v4 based on this design, I want to confirm
three implementation details:
1. Counter IDs: No stable UAPI clocksource numbering exists today
(enum clocksource_ids is kernel-internal). I'll define stable constants
in include/uapi/linux/ptp_clock.h (e.g., PTP_CSID_X86_TSC,
PTP_CSID_ARM_ARCH) and map internal IDs in the chardev layer.
2. Array sizing: The timestamps array will be fixed at PTP_MAX_SAMPLES (25)
in the ioctl struct, not a flexible array, to keep
copy_from_user/copy_to_user bounded.
3. Ioctl numbers: Two separate ioctls (PTP_SYS_OFFSET_EXTENDED_ATTRS and
PTP_SYS_OFFSET_PRECISE_ATTRS) sharing the same payload struct, matching
existing PTP convention.
Arthur
[1] https://lore.kernel.org/netdev/20260515164033.6403-1-akiyano@xxxxxxxxxx/