RE: [PATCH 2/3] KVM: x86: Don't update KVM PV feature CPUID during vCPU running

From: Kechen Lu
Date: Thu Apr 06 2023 - 14:30:55 EST


Hi Sean,

> -----Original Message-----
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
> Sent: Wednesday, April 5, 2023 8:29 PM
> To: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>
> Cc: kvm@xxxxxxxxxxxxxxx; Paolo Bonzini <pbonzini@xxxxxxxxxx>; Thomas
> Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Borislav
> Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx; Kechen Lu <kechenl@xxxxxxxxxx>
> Subject: Re: [PATCH 2/3] KVM: x86: Don't update KVM PV feature CPUID
> during vCPU running
>
> External email: Use caution opening links or attachments
>
>
> +Kechen
>
> On Thu, Mar 30, 2023, Hou Wenlong wrote:
> > __kvm_update_cpuid_runtime() may be called during vCPU running and
> KVM
> > PV feature CPUID is updated too. But the cached KVM PV feature bitmap
> > is not updated. Actually, KVM PV feature CPUID shouldn't be updated,
> > otherwise, KVM PV feature would be broken in guest. Currently, only
> > KVM_FEATURE_PV_UNHALT is updated, and it's impossible after disallow
> > disable HLT exits. However, KVM PV feature CPUID should be updated
> > only in KVM_SET_CPUID{,2} ioctl.
> >
> > Signed-off-by: Hou Wenlong <houwenlong.hwl@xxxxxxxxxxxx>
> > ---
> > arch/x86/kvm/cpuid.c | 17 ++++++++++++-----
> > 1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 6972e0be60fa..af92d3422c79 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -222,6 +222,17 @@ static struct kvm_cpuid_entry2
> *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
> > vcpu->arch.cpuid_nent); }
> >
> > +static void kvm_update_pv_cpuid(struct kvm_vcpu *vcpu, struct
> kvm_cpuid_entry2 *entries,
> > + int nent) {
> > + struct kvm_cpuid_entry2 *best;
> > +
> > + best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> > + if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> > + (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> > + best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); }
> > +
> > void kvm_update_pv_runtime(struct kvm_vcpu *vcpu) {
> > struct kvm_cpuid_entry2 *best =
> > kvm_find_kvm_cpuid_features(vcpu);
> > @@ -280,11 +291,6 @@ static void __kvm_update_cpuid_runtime(struct
> kvm_vcpu *vcpu, struct kvm_cpuid_e
> > cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
> > best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> >
> > - best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
> > - if (kvm_hlt_in_guest(vcpu->kvm) && best &&
> > - (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
> > - best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> > -
> > if (!kvm_check_has_quirk(vcpu->kvm,
> KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> > best = cpuid_entry2_find(entries, nent, 0x1,
> KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> > if (best)
> > @@ -402,6 +408,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu,
> struct kvm_cpuid_entry2 *e2,
> > int r;
> >
> > __kvm_update_cpuid_runtime(vcpu, e2, nent);
> > + kvm_update_pv_cpuid(vcpu, e2, nent);
>
> Hrm, this will silently conflict with the proposed per-vCPU controls[*].
> Though arguably that patch is buggy and "needs" to toggle PV_UNHALT
> when userspace messes with HLT passthrough. But that doesn't really make
> sense either because no guest will react kindly to
> KVM_FEATURE_PV_UNHALT disappearing.

Yes agree, toggling PV_UNHALT with per-vCPU control also sounds not making
sense to me. And as pv feature is per VM bases, if current per-vCPU control
touches the pv feature toggling, that would probably cause a lot of messes.

>
> I really wish this code didn't exist, i.e. that KVM let/forced userspace deal
> with correctly defining guest CPUID.
>
> Kechen, is it feasible for your userspace to clear PV_UNHALT when it (might)
> use the per-vCPU control? I.e. can KVM do as this series proposes and
> update guest CPUID only on KVM_SET_CPUID{2}? Dropping the behavior for
> the per-VM control is probably not an option as I gotta assume that'd break
> userspace, but I would really like to avoid carrying that over to the per-vCPU
> control, which would get quite messy and probably can't work anyways.

Yes, in our use cases, it's feasible to clear PV_UNHALT while using the
per-vCPU control. I think it makes sense on userspace responsibility to clear
the PV_UNHALT bits while trying to use the per-vCPU control for hlt passthrough.
We may add notes/requirement after this line of doc
Documentation/virt/kvm/api.rst:
"Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits."

Best Regards,
Kechen

>
> [*] https://lkml.kernel.org/r/20230121020738.2973-6-kechenl%40nvidia.com