Re: [RFC PATCH 09/13] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL

From: Maxim Levitsky
Date: Tue Mar 04 2025 - 21:57:40 EST


On Mon, 2025-03-03 at 22:10 +0000, Yosry Ahmed wrote:
> On Fri, Feb 28, 2025 at 09:06:18PM -0500, Maxim Levitsky wrote:
> > On Wed, 2025-02-05 at 18:23 +0000, Yosry Ahmed wrote:
> > > Handle L1's requests to flush L2's TLB through the TLB_CONTROL field of
> > > VMCB12. This is currently redundant because a full flush is executed on
> > > every nested transition, but is a step towards removing that.
> > >
> > > TLB_CONTROL_FLUSH_ALL_ASID flushes all ASIDs from L1's perspective,
> > > including its own, so do a guest TLB flush on both transitions. Never
> > > propagate TLB_CONTROL_FLUSH_ALL_ASID from the guest to the actual VMCB,
> > > as this gives the guest the power to flush the entire physical TLB
> > > (including translations for the host and other VMs).
> > >
> > > For other cases, the TLB flush is only done when entering L2. The nested
> > > NPT MMU is also sync'd because TLB_CONTROL also flushes NPT
> > > guest-physical mappings.
> > >
> > > All TLB_CONTROL values can be handled by KVM regardless of FLUSHBYASID
> > > support on the underlying CPU, so keep advertising FLUSHBYASID to the
> > > guest unconditionally.
> > >
> > > Signed-off-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx>
> > > ---
> > > arch/x86/kvm/svm/nested.c | 42 ++++++++++++++++++++++++++++++++-------
> > > arch/x86/kvm/svm/svm.c | 6 +++---
> > > 2 files changed, 38 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index 0735177b95a1d..e2c59eb2907e8 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -484,19 +484,36 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
> > >
> > > static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
> > > {
> > > + struct vcpu_svm *svm = to_svm(vcpu);
> > > +
> > > /* Handle pending Hyper-V TLB flush requests */
> > > kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
> > >
> > > + /*
> > > + * If L1 requested a TLB flush for L2, flush L2's TLB on nested entry
> > > + * and sync the nested NPT MMU, as TLB_CONTROL also flushes NPT
> > > + * guest-physical mappings. We technically only need to flush guest_mode
> > > + * page tables.
> > > + *
> > > + * KVM_REQ_TLB_FLUSH_GUEST will flush L2's ASID even if the underlying
> > > + * CPU does not support FLUSHBYASID (by assigning a new ASID), so we
> > > + * can handle all TLB_CONTROL values from L1 regardless.
> > > + *
> > > + * Note that TLB_CONTROL_FLUSH_ASID_LOCAL is handled exactly like
> > > + * TLB_CONTROL_FLUSH_ASID. We can technically flush less TLB entries,
> > > + * but this would require significantly more complexity.
> > > + */
> >
> > I think it might make sense to note that we in essence support only one non zero ASID
> > in L1, the one that it picks for the nested guest.
> >
> >
> > Thus when asked to TLB_CONTROL_FLUSH_ALL_ASID
> > we need to flush the L2's real asid and L1 asid only.
>
> This is described by the comment in nested_svm_exit_tlb_flush(). Do you
> mean that we should also mention that here?
>
> I guess one way to make things clearer is to describe the behavior for
> all values of TLB_CONTROL here, and in nested_svm_exit_tlb_flush() just
> say /* see nested_svm_entry_tlb_flush() */. Would that improve things?

I guess that this will be a bit better.
This was just a tiny nitpick though, just something I wondered a bit when
reviewing.


Best regards,
Maxim Levitsky



>
> >
> > > + if (svm->nested.ctl.tlb_ctl != TLB_CONTROL_DO_NOTHING) {
> > > + if (nested_npt_enabled(svm))
> > > + kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> > > + }
> > > +
> > > /*
> > > * TODO: optimize unconditional TLB flush/MMU sync. A partial list of
> > > * things to fix before this can be conditional:
> > > *
> > > - * - Honor L1's request to flush an ASID on nested VMRUN
> > > - * - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
> > > * - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> > > - *
> > > - * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
> > > - * NPT guest-physical mappings on VMRUN.
> > > */
> > > kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > > @@ -504,9 +521,18 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
> > >
> > > static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
> > > {
> > > + struct vcpu_svm *svm = to_svm(vcpu);
> > > +
> > > /* Handle pending Hyper-V TLB flush requests */
> > > kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
> > >
> > > + /*
> > > + * If L1 had requested a full TLB flush when entering L2, also flush its
> > > + * TLB entries when exiting back to L1.
> > > + */
> > > + if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
> > > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> >
> > Makes sense.
> >
> > > +
> > > /* See nested_svm_entry_tlb_flush() */
> > > kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > > @@ -825,7 +851,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> > > nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
> > >
> > > svm_switch_vmcb(svm, &svm->nested.vmcb02);
> > > - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> > > + nested_vmcb02_prepare_control(svm, vmcb12->save.rip,
> > > + vmcb12->save.cs.base);
> > > nested_vmcb02_prepare_save(svm, vmcb12);
> > >
> > > ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
> > > @@ -1764,7 +1791,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> > > nested_copy_vmcb_control_to_cache(svm, ctl);
> > >
> > > svm_switch_vmcb(svm, &svm->nested.vmcb02);
> > > - nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
> > > + nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip,
> > > + svm->vmcb->save.cs.base);
> > >
> > > /*
> > > * While the nested guest CR3 is already checked and set by
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 8342c7eadbba8..5e7b1c9bfa605 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -5242,9 +5242,9 @@ static __init void svm_set_cpu_caps(void)
> > > kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
> > >
> > > /*
> > > - * KVM currently flushes TLBs on *every* nested SVM transition,
> > > - * and so for all intents and purposes KVM supports flushing by
> > > - * ASID, i.e. KVM is guaranteed to honor every L1 ASID flush.
> > > + * KVM currently handles all TLB_CONTROL values set by L1, even
> > > + * if the underlying CPU does not. See
> > > + * nested_svm_transition_tlb_flush().
> > > */
> > > kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID);
> > >
> >
> > Patch looks OK, but I can't be 100% sure about this.
> >
> > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> >
> > Best regards,
> > Maxim Levitsky
> >
> >