Re: [RFC PATCH 12/13] KVM: nSVM: Service local TLB flushes before nested transitions
From: Yosry Ahmed
Date: Wed Mar 05 2025 - 01:37:42 EST
On Tue, Mar 04, 2025 at 10:03:51PM -0500, Maxim Levitsky wrote:
> On Mon, 2025-03-03 at 22:18 +0000, Yosry Ahmed wrote:
> > On Fri, Feb 28, 2025 at 09:20:18PM -0500, Maxim Levitsky wrote:
> > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > > KVM does not track TLB flush requests for L1 vs. L2. Hence, service
> > > > local flush that target the current context before switching to a new
> > > > one. Since ASIDs are tracked per-VMCB, service the flushes before every
> > > > VMCB switch.
> > > >
> > > > This is conceptually similar to how nVMX calls
> > > > kvm_service_local_tlb_flush_requests() in
> > > > nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(), with the
> > > > following differences:
> > > >
> > > > 1. nVMX tracks the current VPID based on is_guest_mode(), so local TLB
> > > > flushes are serviced before enter_guest_mode() and
> > > > leave_guest_mode(). On the other hand, nSVM tracks the current ASID
> > > > based on the current VMCB, so the TLB flushes are serviced before an
> > > > VMCB switch.
> > > >
> > > > 2. nVMX only enters and leaves guest mode in
> > > > nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(). Other paths
> > > > like vmx_set_nested_state() and vmx_leave_nested() call into these
> > > > two functions. On the other hand, nSVM open codes the switch in
> > > > functions like svm_set_nested_state() and svm_leave_nested(), so
> > > > servicing the flush in svm_switch_svm() is probably most reliable.
> > > >
> > > > Signed-off-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
> > > > ---
> > > > arch/x86/kvm/svm/svm.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 5e7b1c9bfa605..6daa7efa9262b 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -1421,6 +1421,12 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > > >
> > > > void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
> > > > {
> > > > + /*
> > > > + * ASIDs are tracked per-VMCB. Perform any pending TLB flushes for the
> > > > + * current VMCB before switching to a new one.
> > > > + */
> > > > + kvm_service_local_tlb_flush_requests(&svm->vcpu);
> > > > +
> > > > svm->current_vmcb = target_vmcb;
> > > > svm->vmcb = target_vmcb->ptr;
> > > > }
> > >
> > > Note that another difference between SVM and VMX is that this code will only set tlb_ctl
> > > in the current vmcb, the actual flush can happen much later, when we do VM entry with this vmcb,
> > > e.g if we are now in L2, the flush will happen when we enter L2 again.
> >
> > Right, but I think the internal implementation of the TLB flushes is not
> > relevant in this specific instance. Do you think it would be useful to
> > mention that here?
>
> I am not sure to be honest, I just mentioned this because in theory there can be a difference,
> in regard to the fact that we might think that we flushed the TLB while in fact we haven't yet.
>
> I am trying my best to think about what hidden problems might lurk around and surface later.
>
> Not directly related to the above, but I am thinking:
> I really like the way SVM flush works because it ensures that redundant flushes don't cost anything.
>
> I wonder if we can make VMX code emulate this,
> by having an emulated 'tlb_control' field and then doing the flush (INVEPT) on VM entry.
I briefly thought about this while working on this series. One way to do
this is to make TLB flush requests (e.g. KVM_REQ_TLB_FLUSH) when
emulating INVVPID and INVEPT instead of doing the flush right away. Then
we do service the flushes once before running the guest. Redundant
flushes are coalesced.
But I did not look too closely if the existing TLB flush requests
satisfy all the use cases or if we would need to create new ones. Not
sure if the complexity would be worth it, but something to think about.