Re: [RFC PATCH v2 0/8] timekeeping: Fix draft tracking precision and add feed-forward discipline via vmclock

From: Miroslav Lichvar

Date: Tue May 26 2026 - 03:11:22 EST


On Mon, May 25, 2026 at 10:14:10AM +0100, David Woodhouse wrote:
> On Mon, 2026-05-25 at 10:08 +0200, Miroslav Lichvar wrote:
> > On Thu, May 21, 2026 at 10:54:41AM +0100, David Woodhouse wrote:
> > > On Thu, 2026-05-21 at 08:35 +0200, Miroslav Lichvar wrote:
> > > > Ok, but I don't see why the phase corrections of the guest need to be
> > > > in the kernel.
> > >
> > > I'm not sure I understand. 
> >
> > <..clarification...>
> >
> > /* Compute phase offset at cycle_last and set time_offset to slew */
> > ...
> > ntp_set_time_offset(tk->id, ref_err >> tk->tkr_mono.shift);
> >
>
> Ah, I see. Thanks.
>
> But that's just using ->time_offset which has *always* been in the
> kernel.

time_offset is an input of the kernel PLL. My concern is that the PLL
is fed directly by ptp_vmclock, ignoring everything else. There is no
setting of the PLL time constant or the flags, no configuration of the
step threshold, or any other options that a more advanced
implementation might have. To me it feels like a bad shortcut. I think
this part of the loop should be in userspace, properly using the
adjtimex() API. The feed-forward part (copying frequency settings of
the host) is still possible.

> There's nothing fundamental in the actual *timekeeping* here that
> hasn't already been in the guest kernel for decades; I'm just fixing a
> few arithmetic errors in the core code, and then *driving* it more
> precisely using its existing parameters (tick_length, time_offset).

Fixing arithmetic errors is great. The driving part is what I'm
concerned about, like where it is and what it is driving.

> > > Right. This *is* the software fallback, because the hardware scaling
> > > and offset aren't sufficient even if we only care about x86 where the
> > > former is supported.
> >
> > IMHO it's a solution done at a wrong layer.
>
> Understood. What do you believe is the better solution?

I think a better solution is scaling of the clocksource, i.e. a layer
below the realtime clock. An additional multiplier applied in HW or
SW. That would address the problem for all system clocks, not just the
realtime clock. adjtimex() changes are applied on top of that, they
are not in conflict.

> Aside from the case of actually using NTP or a PHC to discipline the
> kernel's CLOCK_REALTIME, the use cases I'm trying to enable are:
>
> • (Micro)VM guest is *given* the TSC→realtime relationship in a virt
> enlightenment, gets an interrupt whenever it changes. Can react to
> that interrupt and steer the kernel's timekeeping as quickly as any
> userspace dæmon could do anything.
>
> • Dedicated virtual hosting environment needs to discipline the *TSC*
> directly against external references (PHC, 1PPS) in order to provide
> said virt enlightenment directly to guests and allow for accurate
> migration. This environment does not care about the host's actual
> CLOCK_REALTIME; that's basically cosmetic for logging purposes.
>
> • Multi-purpose environment has a standard ntpd/chrony setup, wants
> QEMU to be able to provide the same virt enlightenment based on
> the kernel's own timekeeping.

Which of those couldn't be done with the clocksource scaling and/or
adjtimex() if all the necessary information was available to userspace?

--
Miroslav Lichvar