Re: [PATCH v2] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

From: David Woodhouse
Date: Mon Apr 29 2024 - 19:32:56 EST


On Mon, 2024-04-29 at 13:45 -0700, Sean Christopherson wrote:
> On Tue, 06 Feb 2024 16:19:50 +0100, Vitaly Kuznetsov wrote:
> > xen_shinfo_test is observed to be flaky failing sporadically with
> > "VM time too old". With min_ts/max_ts debug print added:
> >
> > Wall clock (v 3269818) 1704906491.986255664
> > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > min_ts: 1704906491.986312153
> > max_ts: 1704906506.001006963
> > ==== Test Assertion Failure ====
> >    x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> >    pid=32724 tid=32724 errno=4 - Interrupted system call
> >       1        0x00000000004030ad: main at xen_shinfo_test.c:1003
> >       2        0x00007fca6b23feaf: ?? ??:0
> >       3        0x00007fca6b23ff5f: ?? ??:0
> >       4        0x0000000000405e04: _start at ??:?
> >    VM time too old
> >
> > [...]
>
> Applied to kvm-x86 selftests, thanks!
>
> [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
>       https://github.com/kvm-x86/linux/commit/201142d16010

Of course, this just highlights the fact that the very *definition* of
the wallclock time as exposed in the Xen shinfo and MSR_KVM_WALL_CLOCK
is entirely broken now.

When the KVM clock was based on CLOCK_MONOTONIC, the delta between that
and wallclock time was constant (well, apart from leap seconds but KVM
has *always* been utterly hosed for that, so that's just par for the
course). So that made sense.

But when we switched the KVM clock to CLOCK_MONOTONIC_RAW, trying to
express wallclock time in terms of the KVM clock became silly. They run
at different rates, so the value returned by kvm_get_wall_clock_epoch()
will be constantly changing.

As I work through cleaning up the KVM clock mess, it occurred to me
that we should maybe *refresh* the wallclock time we report to the
guest. But I think it's just been hosed for so long that no guest could
ever trust it for anything but knowing roughly what year it is when
first booting, and it isn't worth fixing.

What we *should* do is expose something new which exposes the NTP-
calibrated relationship between the arch counter (or TSC) and the real
time, being explicit about TAI and about live migration (a guest needs
to know when it's been migrated and should throw away any NTP
refinement that it's done for *itself*).

I know we have the PTP paired reading thing, but that's *still* not
TAI, it makes guests do the work for themselves and doesn't give a
clean signal when live migration disrupts them.

Attachment: smime.p7s
Description: S/MIME cryptographic signature