Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

From: Sean Christopherson
Date: Thu Sep 22 2022 - 11:23:55 EST

Next message: Jerry Ray: "[internal][PATCH] dt-bindings: dsa: lan9303: Add lan9303 yaml"
Previous message: Tze-nan Wu: "[PATCH] watchdog: Fix a typo in comment"
In reply to: Vitaly Kuznetsov: "Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag"
Next in thread: Vitaly Kuznetsov: "Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote:
> Now let's get to VMX and the point of my confusion (and thanks in
> advance for educating me!):
> AFAIU, when EPT is in use:
> KVM_REQ_TLB_FLUSH_CURRENT == invept
> KVM_REQ_TLB_FLUSH_GUEST = invvpid
>
> For "normal" mappings (which are mapped on both stages) this is the same
> thing as they're 'tagged' with both VPID and 'EPT root'. The question is
> what's left. Given your comment, do I understand correctly that in case
> of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
> will only be tagged with VPID but not with 'EPT root' (as the CPU never
> reached to the second translation stage)? We certainly can't ignore
> these. Another (probably pure theoretical question) is what are the
> mappings which are tagged with 'EPT root' but don't have a VPID tag?

Intel puts mappings into three categories, which for non-root mode equates to:

linear == GVA => GPA
guest-physical == GPA => HPA
combined == GVA => HPA

and essentially the categories that consume the GVA are tagged with the VPID
(linear and combined), and categories that consume the GPA are tagged with the
EPTP address (guest-physical and combined).

> Are these the mapping which happen when e.g. vCPU has paging disabled?

No, these mappings can be created at all times. Even with CR0.PG=1, the guest
can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables
themselves, RTIT (Intel PT) addresses, etc... And even for combined/full
translations, the CPU can insert TLB entries for just the GPA=>HPA part.

E.g. when a page is allocated by/for userspace, the kernel will zero the page using
the kernel's direct map, but userspace will access the page via a different GVA.
I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u). By inserting
the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU
encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y).

Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain
the partial TLB entries, e.g. a host change in the EPT tables will result in the
guest-physical and combined mappings being invalidated, but linear mappings can
be kept.

I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA
not affecting NPT translations, AMD just doesn't provide a way for the host to
flush _only_ NPT translations. Maybe the performance benefits weren't significant
enough to justify the extra complexity?

> These are probably unrelated to Hyper-V TLB flushing.
>
> To preserve the 'small' optimization, we can probably move
> kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> to nested_svm_transition_tlb_flush() or, in case this sounds too
> hackish

Move it to svm_flush_tlb_current(), because the justification is that on SVM,
flushing "current" TLB entries also flushes "guest" TLB entries due to the more
coarse-grained ASID-based TLB flush. E.g.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd599afc85f5..a86b41503723 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ /*
+ * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
+ * A TLB flush for the current ASID flushes both "host" and "guest" TLB
+ * entries, and thus is a superset of Hyper-V's fine grained flushing.
+ */
+ kvm_hv_vcpu_purge_flush_tlb(vcpu);
+
/*
* Flush only the current ASID even if the TLB flush was invoked via
* kvm_flush_remote_tlbs(). Although flushing remote TLBs requires all

> we can drop it for now and add it to the (already overfull)
> bucket of the "optimize nested_svm_transition_tlb_flush()".

I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the
correct/desired behavior. This doesn't really have anything to do with nSVM,
it's all about SVM not providing a way to flush only NPT entries.

Next message: Jerry Ray: "[internal][PATCH] dt-bindings: dsa: lan9303: Add lan9303 yaml"
Previous message: Tze-nan Wu: "[PATCH] watchdog: Fix a typo in comment"
In reply to: Vitaly Kuznetsov: "Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag"
Next in thread: Vitaly Kuznetsov: "Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]