Re: [PATCH v5] KVM: x86/tsc: Don't sync TSC on the first write in state restoration

From: David Woodhouse
Date: Wed Sep 13 2023 - 05:51:55 EST




On 13 September 2023 11:43:56 CEST, Like Xu <like.xu.linux@xxxxxxxxx> wrote:

>> Why? Can't we treat an explicit zero write just the same as when the kernel does it?
>
>Not sure if it meets your simplified expectations:

Think that looks good, thanks. One minor nit...


>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 6c9c81e82e65..0f05cf90d636 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -2735,20 +2735,35 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
> * kvm_clock stable after CPU hotplug
> */
> synchronizing = true;
>- } else {
>+ } else if (!data || kvm->arch.user_set_tsc) {

If data is zero here, won't the first if() case have been taken, and set synchronizing=true?

So this is equivalent to "else if (kvm->arch.user_set_tsc)". (Which is fine and what what I intended).

> u64 tsc_exp = kvm->arch.last_tsc_write +
> nsec_to_cycles(vcpu, elapsed);
> u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL;
> /*
>- * Special case: TSC write with a small delta (1 second)
>- * of virtual cycle time against real time is
>- * interpreted as an attempt to synchronize the CPU.
>+ * Here lies UAPI baggage: when a user-initiated TSC write has
>+ * a small delta (1 second) of virtual cycle time against the
>+ * previously set vCPU, we assume that they were intended to be
>+ * in sync and the delta was only due to the racy nature of the
>+ * legacy API.
>+ *
>+ * This trick falls down when restoring a guest which genuinely
>+ * has been running for less time than the 1 second of imprecision
>+ * which we allow for in the legacy API. In this case, the first
>+ * value written by userspace (on any vCPU) should not be subject
>+ * to this 'correction' to make it sync up with values that only
>+ * from from the kernel's default vCPU creation. Make the 1-second
>+ * slop hack only trigger if flag is already set.
>+ *
>+ * The correct answer is for the VMM not to use the legacy API.
> */
> synchronizing = data < tsc_exp + tsc_hz &&
> data + tsc_hz > tsc_exp;
> }
> }
>
>+ if (data)
>+ kvm->arch.user_set_tsc = true;
>+
> /*
> * For a reliable TSC, we can match TSC offsets, and for an unstable
> * TSC, we add elapsed time in this computation. We could let the
>@@ -5536,6 +5551,7 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu,
> tsc = kvm_scale_tsc(rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset;
> ns = get_kvmclock_base_ns();
>
>+ kvm->arch.user_set_tsc = true;
> __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
> raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>
>