Re: [PATCH] KVM: x86: Unify cross-vCPU IBPB

From: Yosry Ahmed
Date: Wed Mar 26 2025 - 18:17:01 EST


On Wed, Mar 26, 2025 at 01:46:04PM -0700, Sean Christopherson wrote:
> On Thu, Mar 20, 2025, Yosry Ahmed wrote:
> > arch/x86/kvm/svm/svm.c | 24 ------------------------
> > arch/x86/kvm/svm/svm.h | 2 --
> > arch/x86/kvm/vmx/nested.c | 6 +++---
> > arch/x86/kvm/vmx/vmx.c | 15 ++-------------
> > arch/x86/kvm/vmx/vmx.h | 3 +--
> > arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
> > 6 files changed, 24 insertions(+), 45 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 8abeab91d329d..89bda9494183e 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1484,25 +1484,10 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> > return err;
> > }
> >
> > -static void svm_clear_current_vmcb(struct vmcb *vmcb)
> > -{
> > - int i;
> > -
> > - for_each_online_cpu(i)
> > - cmpxchg(per_cpu_ptr(&svm_data.current_vmcb, i), vmcb, NULL);
>
> Ha! I was going to say that processing only online CPUs is likely wrong, but
> you made that change on the fly. I'll probably split that to a separate commit
> since it's technically a bug fix.

Good call. To be completely honest I didn't even realize I fixed this. I
just used for_each_possible_cpu() in kvm_arch_vcpu_destroy() because I
thought that's the right thing to do, and I didn't notice that the SVM
code was using for_each_online_cpu() :)

>
> A few other nits, but I'll take care of them when applying.

Thanks!

>
> Overall, nice cleanup!
>
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 69c20a68a3f01..4034190309a61 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4961,6 +4961,8 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
> > return kvm_arch_has_noncoherent_dma(vcpu->kvm);
> > }
> >
> > +static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu);
> > +
> > void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> > {
> > struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > @@ -4983,6 +4985,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >
> > kvm_x86_call(vcpu_load)(vcpu, cpu);
> >
> > + if (vcpu != per_cpu(last_vcpu, cpu)) {
>
> I have a slight preference for using this_cpu_read() (and write) so that it's more
> obvious this is operating on the current CPU.

Hmm I think it's confusing that a cpu is passed into
kvm_arch_vcpu_load(), yet we use the current CPU here. In practice it
seems to me that they will always be the same, but if we want to make
this clear I'd rather we do it on the scope of the entire function.

We can probably stop passing in a CPU and just use the current CPU
throughout the function, and just add an assertion that preemption is
disabled.

>
> > + /*
> > + * Flush the branch predictor when switching vCPUs on the same physical
> > + * CPU, as each vCPU should have its own branch prediction domain. No
> > + * IBPB is needed when switching between L1 and L2 on the same vCPU
> > + * unless IBRS is advertised to the vCPU. This is handled on the nested
> > + * VM-Exit path.
> > + */
> > + indirect_branch_prediction_barrier();
> > + per_cpu(last_vcpu, cpu) = vcpu;
> > + }
> > +
> > /* Save host pkru register if supported */
> > vcpu->arch.host_pkru = read_pkru();
> >
> > @@ -12367,10 +12381,13 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
> >
> > void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> > {
> > - int idx;
> > + int idx, cpu;
> >
> > kvmclock_reset(vcpu);
> >
> > + for_each_possible_cpu(cpu)
> > + cmpxchg(per_cpu_ptr(&last_vcpu, cpu), vcpu, NULL);
>
> It's definitely worth keeping a version of SVM's comment to explaining the cross-CPU
> nullification.

Good idea. Should I send a new version or will you take care of this as
well while applying?

>
> > +
> > kvm_x86_call(vcpu_free)(vcpu);
> >
> > kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt);
> > --
> > 2.49.0.395.g12beb8f557-goog
> >