Re: [RFC PATCH 12/13] KVM: nSVM: Service local TLB flushes before nested transitions
From: Maxim Levitsky
Date: Tue Mar 04 2025 - 22:04:02 EST
On Mon, 2025-03-03 at 22:18 +0000, Yosry Ahmed wrote:
> On Fri, Feb 28, 2025 at 09:20:18PM -0500, Maxim Levitsky wrote:
> > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > KVM does not track TLB flush requests for L1 vs. L2. Hence, service
> > > local flush that target the current context before switching to a new
> > > one. Since ASIDs are tracked per-VMCB, service the flushes before every
> > > VMCB switch.
> > >
> > > This is conceptually similar to how nVMX calls
> > > kvm_service_local_tlb_flush_requests() in
> > > nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(), with the
> > > following differences:
> > >
> > > 1. nVMX tracks the current VPID based on is_guest_mode(), so local TLB
> > > flushes are serviced before enter_guest_mode() and
> > > leave_guest_mode(). On the other hand, nSVM tracks the current ASID
> > > based on the current VMCB, so the TLB flushes are serviced before an
> > > VMCB switch.
> > >
> > > 2. nVMX only enters and leaves guest mode in
> > > nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(). Other paths
> > > like vmx_set_nested_state() and vmx_leave_nested() call into these
> > > two functions. On the other hand, nSVM open codes the switch in
> > > functions like svm_set_nested_state() and svm_leave_nested(), so
> > > servicing the flush in svm_switch_svm() is probably most reliable.
> > >
> > > Signed-off-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
> > > ---
> > > arch/x86/kvm/svm/svm.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 5e7b1c9bfa605..6daa7efa9262b 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -1421,6 +1421,12 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > >
> > > void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
> > > {
> > > + /*
> > > + * ASIDs are tracked per-VMCB. Perform any pending TLB flushes for the
> > > + * current VMCB before switching to a new one.
> > > + */
> > > + kvm_service_local_tlb_flush_requests(&svm->vcpu);
> > > +
> > > svm->current_vmcb = target_vmcb;
> > > svm->vmcb = target_vmcb->ptr;
> > > }
> >
> > Note that another difference between SVM and VMX is that this code will only set tlb_ctl
> > in the current vmcb, the actual flush can happen much later, when we do VM entry with this vmcb,
> > e.g if we are now in L2, the flush will happen when we enter L2 again.
>
> Right, but I think the internal implementation of the TLB flushes is not
> relevant in this specific instance. Do you think it would be useful to
> mention that here?
I am not sure to be honest, I just mentioned this because in theory there can be a difference,
in regard to the fact that we might think that we flushed the TLB while in fact we haven't yet.
I am trying my best to think about what hidden problems might lurk around and surface later.
Not directly related to the above, but I am thinking:
I really like the way SVM flush works because it ensures that redundant flushes don't cost anything.
I wonder if we can make VMX code emulate this,
by having an emulated 'tlb_control' field and then doing the flush (INVEPT) on VM entry.
Best regards,
Maxim Levitsky
>
> If we were to document the difference in TLB flush handling between VMX
> and SVM I think a better place would be at kvm_vcpu_flush_tlb_*(), or
> maybe in kvm_host.h where the vendor callbacks are defined? Not sure.
>
> > I think that this is correct but I might be mistaken.
> >
> > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
>
> Thanks!
>