RE: [PATCH] KVM: x86/xen: Update Xen CPUID Leaf 4 (tsc info) sub-leaves, if present

From: Durrant, Paul
Date: Wed Jun 22 2022 - 11:07:07 EST


> -----Original Message-----
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
> Sent: 22 June 2022 15:44
> To: Durrant, Paul <pdurrant@xxxxxxxxxxxx>
> Cc: x86@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Paolo Bonzini
> <pbonzini@xxxxxxxxxx>; Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>; Wanpeng Li <wanpengli@xxxxxxxxxxx>; Jim
> Mattson <jmattson@xxxxxxxxxx>; Joerg Roedel <joro@xxxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>;
> Ingo Molnar <mingo@xxxxxxxxxx>; Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen
> <dave.hansen@xxxxxxxxxxxxxxx>; H. Peter Anvin <hpa@xxxxxxxxx>
> Subject: RE: [EXTERNAL][PATCH] KVM: x86/xen: Update Xen CPUID Leaf 4 (tsc info) sub-leaves, if present
>
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, Jun 22, 2022, Paul Durrant wrote:
> > The scaling information in sub-leaf 1 should match the values in the
> > 'vcpu_info' sub-structure 'time_info' (a.k.a. pvclock_vcpu_time_info) which
> > is shared with the guest. The offset values are not set since a TSC offset
> > is already applied.
> > The host TSC frequency should also be set in sub-leaf 2.
>
> Explain why this is KVM's problem, i.e. why userspace is unable to set the correct
> values.

Ok, I'll explain that there is no interface for the VMM to acquire the time_info.

>
> > This patch adds a new kvm_xen_set_cpuid() function that scans for the
>
> Please avoid "This patch".
>
> > relevant CPUID leaf when the CPUID information is updated by the VMM and
> > stashes pointers to the sub-leaves in the kvm_vcpu_xen structure.
> > The values are then updated by a call to the, also new,
> > kvm_xen_setup_tsc_info() function made at the end of
> > kvm_guest_time_update() just before entering the guest.
>
> This is not a helpful paragraph, it provides zero information that isn't obvious
> from the code.
>
> The changelog should read something like:
>
> Update Xen CPUID leaves that expose TSC frequency and scaling information
> to the guest <blah blah blah>. Cache the leaves <blah blah blah>.
>

Ok, sure.

Paul

> > Signed-off-by: Paul Durrant <pdurrant@xxxxxxxxxx>
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 ++
> > arch/x86/kvm/cpuid.c | 2 ++
> > arch/x86/kvm/x86.c | 1 +
> > arch/x86/kvm/xen.c | 41 +++++++++++++++++++++++++++++++++
> > arch/x86/kvm/xen.h | 10 ++++++++
> > 5 files changed, 56 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 1038ccb7056a..f77a4940542f 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -638,6 +638,8 @@ struct kvm_vcpu_xen {
> > struct hrtimer timer;
> > int poll_evtchn;
> > struct timer_list poll_timer;
> > + struct kvm_cpuid_entry2 *tsc_info_1;
> > + struct kvm_cpuid_entry2 *tsc_info_2;
> > };
> >
> > struct kvm_vcpu_arch {
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index d47222ab8e6e..eb6cd88c974a 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -25,6 +25,7 @@
> > #include "mmu.h"
> > #include "trace.h"
> > #include "pmu.h"
> > +#include "xen.h"
> >
> > /*
> > * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
> > @@ -310,6 +311,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > __cr4_reserved_bits(guest_cpuid_has, vcpu);
> >
> > kvm_hv_set_cpuid(vcpu);
> > + kvm_xen_set_cpuid(vcpu);
> >
> > /* Invoke the vendor callback only after the above state is updated. */
> > static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 00e23dc518e0..8b45f9975e45 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3123,6 +3123,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
> > if (vcpu->xen.vcpu_time_info_cache.active)
> > kvm_setup_guest_pvclock(v, &vcpu->xen.vcpu_time_info_cache, 0);
> > kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
> > + kvm_xen_setup_tsc_info(v);
>
> This can be called inside this if statement, no?
>
> if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
>
> }
>
> > return 0;
> > }
> >
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 610beba35907..a016ff85264d 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -10,6 +10,9 @@
> > #include "xen.h"
> > #include "hyperv.h"
> > #include "lapic.h"
> > +#include "cpuid.h"
> > +
> > +#include <asm/xen/cpuid.h>
> >
> > #include <linux/eventfd.h>
> > #include <linux/kvm_host.h>
> > @@ -1855,3 +1858,41 @@ void kvm_xen_destroy_vm(struct kvm *kvm)
> > if (kvm->arch.xen_hvm_config.msr)
> > static_branch_slow_dec_deferred(&kvm_xen_enabled);
> > }
> > +
> > +void kvm_xen_set_cpuid(struct kvm_vcpu *vcpu)
>
> This is a very, very misleading name. It does not "set" anything. Given that
> this patch adds "set" and "setup", I expected the "set" to you know, set the CPUID
> leaves and the "setup" to prepar for that, not the other way around.
>
> If the leaves really do need to be cached, kvm_xen_after_set_cpuid() is probably
> the least awful name.
>
> > +{
> > + u32 base = 0;
> > + u32 function;
> > +
> > + for_each_possible_hypervisor_cpuid_base(function) {
> > + struct kvm_cpuid_entry2 *entry = kvm_find_cpuid_entry(vcpu, function, 0);
> > +
> > + if (entry &&
> > + entry->ebx == XEN_CPUID_SIGNATURE_EBX &&
> > + entry->ecx == XEN_CPUID_SIGNATURE_ECX &&
> > + entry->edx == XEN_CPUID_SIGNATURE_EDX) {
> > + base = function;
> > + break;
> > + }
> > + }
> > + if (!base)
> > + return;
> > +
> > + function = base | XEN_CPUID_LEAF(3);
> > + vcpu->arch.xen.tsc_info_1 = kvm_find_cpuid_entry(vcpu, function, 1);
> > + vcpu->arch.xen.tsc_info_2 = kvm_find_cpuid_entry(vcpu, function, 2);
>
> Is it really necessary to cache the leave? Guest CPUID isn't optimized, but it's
> not _that_ slow, and unless I'm missing something updating the TSC frequency and
> scaling info should be uncommon, i.e. not performance critical.