Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE

From: Thomas Gleixner
Date: Tue Dec 08 2020 - 15:21:20 EST


On Tue, Dec 08 2020 at 09:43, Andy Lutomirski wrote:
> On Tue, Dec 8, 2020 at 6:23 AM Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> It looks like it tries to accomplish the right goal, but in a rather
> roundabout way. The host knows how to convert from TSC to
> CLOCK_REALTIME, and ptp_kvm.c exposes this data to the guest. But,
> rather than just making the guest use the same CLOCK_REALTIME data as
> the host, ptp_kvm.c seems to expose information to usermode that a
> user daemon could use to attempt (with some degree of error?) to use
> to make the guest kernel track CLOCK_REALTIME. This seems inefficient
> and dubiously accurate.
>
> My feature request is for this to be fully automatic and completely
> coherent. I would like for a host user program and a guest user
> program to be able to share memory, run concurrently, and use the
> shared memory to exchange CLOCK_REALTIME values without ever observing
> the clock going backwards. This ought to be doable. Ideally the
> result should even be usable for Spanner-style synchronization
> assuming the host clock is good enough. Also, this whole thing should
> work without needing to periodically wake the guest to remain
> synchronized. If the guest sleeps for two minutes (full nohz-idle, no
> guest activity at all), the host makes a small REALTIME frequency
> adjustment, and then the guest runs user code that reads
> CLOCK_REALTIME, the guest clock should still be fully synchronized
> with the host. I don't think that ptp_kvm.c-style synchronization can
> do this.

One issue here is that guests might want to run their own NTP/PTP. One
reason to do that is that some people prefer the leap second smearing
NTP servers.

> tglx etc, I think that doing this really really nicely might involve
> promoting something like the current vDSO data structures to ABI -- a
> straightforward-ish implementation would be for the KVM host to export
> its vvar clock data to the guest and for the guest to use it, possibly
> with an offset applied. The offset could work a lot like timens works
> today.

Works nicely if the guest TSC is not scaled. But that means that on
migration the raw TSC usage in the guest is borked because the new host
might have a different TSC frequency.

If you use TSC scaling then the conversion needs to take TSC scaling
into account which needs some thought. And the guest would need to read
the host conversion from 'vdso data' and the scaling from the next page
(per guest) and then still has to support timens. Doable but adds extra
overhead on every time read operation.

If you want to avoid that you are back to the point where you need to
chase all guest data when the host NTP/PTP adjusts the host side.
Chasing and updating all this stuff in the tick was the reason why I was
fighting the idea of clock realtime in namespaces.

Thanks,

tglx