Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
From: Liran Alon
Date: Thu Sep 26 2019 - 20:10:13 EST
> On 27 Sep 2019, at 0:43, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
>
> Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
> isntead of deferring the VMWRITE until vmx_set_cr3(). If the VMWRITE
> is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
> refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
> VM-Exit occurs without actually entering L2, e.g. if the nested run
> is squashed because L2 is being put into HLT.
I would rephrase to âIf an emulated VMEntry is squashed because L1 sets vmcs12->guest_activity_state to HLTâ.
I think itâs a bit more explicit.
>
> In an ideal world where EPT *requires* unrestricted guest (and vice
> versa), VMX could handle CR3 similar to how it handles RSP and RIP,
> e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run(). But
> the unrestricted guest silliness complicates the dirty tracking logic
> to the point that explicitly handling vmcs02.GUEST_CR3 during nested
> VM-Enter is a simpler overall implementation.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: Reto Buerki <reet@xxxxxxxxxxx>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> ---
> arch/x86/kvm/vmx/nested.c | 8 ++++++++
> arch/x86/kvm/vmx/vmx.c | 9 ++++++---
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 41abc62c9a8a..971a24134081 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> entry_failure_code))
> return -EINVAL;
>
> + /*
> + * Immediately write vmcs02.GUEST_CR3. It will be propagated to vmcs12
> + * on nested VM-Exit, which can occur without actually running L2, e.g.
> + * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
> + */
If I understand correctly, itâs not exactly if L2 is entering HLT state in general.
(E.g. issue doesnât occur if L2 runs HLT directly which is not configured to be intercepted by vmcs12).
Itâs specifically when L1 enters L2 with a HLT guest-activity-state. I suggest rephrasing comment.
> + if (enable_ept)
> + vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
> +
> /* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
> if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
> is_pae_paging(vcpu)) {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d4575ffb3cec..b530950a9c2b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> {
> struct kvm *kvm = vcpu->kvm;
> unsigned long guest_cr3;
> + bool skip_cr3 = false;
> u64 eptp;
>
> guest_cr3 = cr3;
> @@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> }
>
> - if (enable_unrestricted_guest || is_paging(vcpu) ||
> - is_guest_mode(vcpu))
> + if (is_guest_mode(vcpu))
> + skip_cr3 = true;
> + else if (enable_unrestricted_guest || is_paging(vcpu))
> guest_cr3 = kvm_read_cr3(vcpu);
> else
> guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
> ept_load_pdptrs(vcpu);
> }
>
> - vmcs_writel(GUEST_CR3, guest_cr3);
> + if (!skip_cr3)
Nit: Itâs a matter of taste, but I prefer positive conditions. i.e. âbool write_guest_cr3â.
Anyway, code seems valid to me. Nice catch.
Reviewed-by: Liran Alon <liran.alon@xxxxxxxxxx>
-Liran
> + vmcs_writel(GUEST_CR3, guest_cr3);
> }
>
> int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> --
> 2.22.0
>