Re: [PATCH v2] KVM x86/xen: add an override for PVCLOCK_TSC_STABLE_BIT

From: David Woodhouse
Date: Tue Oct 31 2023 - 19:07:29 EST


On Tue, 2023-10-31 at 22:58 +0000, Sean Christopherson wrote:
> On Tue, Oct 31, 2023, David Woodhouse wrote:
> > On Tue, 2023-10-31 at 15:39 -0700, Sean Christopherson wrote:
> > > On Tue, Oct 31, 2023, Paul Durrant wrote:
> > > Any reason not to make this a generic "capability" instead of a Xen specific flag?
> > > E.g. I assume these problematic guests would mishandle PVCLOCK_TSC_STABLE_BIT if
> > > it showed up in kvmclock, but they don't use kvmclock so it's not a problem in
> > > practice.
> >
> > No, those guests are just fine with kvmclock. It's the *Xen* page they
> > forgot to map to userspace for the vDSO to use. And it's Xen (true Xen)
> > which made you jump through hoops to use the TSC that way, such that it
> > would actually expose the PVCLOCK_TSC_STABLE_BIT. We don't expect, and
> > have never seen, such issues with native KVM guests.
>
> Hmm, and I suppose theoretically the guest kernel could choose to ignore the Xen
> interface for whatever reason.  Mostly out of curiosity, is this flag something
> that'd be set anytime Xen is advertised to the guest?

Probably not in QEMU; I'll make it optional there.

Hosting providers who are migrating millions of Xen guests to KVM and
want to do so with as little customer pain as possible, and who have
already had customer failures due to this guest kernel bug... are more
likely to turn it on for all "Xen" guests.

> > > I doubt there's a real need or use case, but it'd require less churn and IMO is
> > > simpler overall, e.g.
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index e3eb608b6692..731b201bfd5a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3225,7 +3225,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > >  
> > >         /* If the host uses TSC clocksource, then it is stable */
> > >         pvclock_flags = 0;
> > > -       if (use_master_clock)
> > > +       if (use_master_clock && !vcpu->kvm.force_tsc_unstable)
> > >                 pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
> > >  
> > >         vcpu->hv_clock.flags = pvclock_flags;
> > >
> > > I also assume this is a "set and forget" thing?  I.e. KVM can require the flag
> > > to be set before any vCPUs are created.
> >
> > Hrm, not sure we have previously required that the KVM_XEN_HVM_CONFIG
> > setup be done before any vCPUs were created.
>
> Oh, I was asking in the context of adding a generic capability.

Yeah, it's saner for it to be set-and-forget. We *could* contrive some
kind of detection for the affected guest kernels and turn it off just
for them... but no, I just don't want to.

> > I tend to prefer *not* to push ordering requirements onto userspace.
>
> For per-VM flags that are consumed by vCPUs, it makes reasoning about correctness
> and what is/isn't allowed much, much easier.
>
> > Does it need to be a per-vcpu thing?
>
> Huh?  No, I was only asking (again, for a generic capability) if we could do
>
>                 mutex_lock(&kvm->lock);
>                 if (!kvm->created_vcpus) {
>                         kvm->arch.force_tsc_unstable = true;
>                         r = 0;
>                 }
>                 mutex_unlock(&kvm->lock);
>
> So that it would be blatantly obvious that there's no race with checking a per-VM
> flag without any lock/RCU protections.

Makes sense. Although TBH if the VMM wants to flip this bit on and off
at runtime while the guest clocks are being updated, it deserves what
it gets. It's not a problem for KVM.

Attachment: smime.p7s
Description: S/MIME cryptographic signature