Re: [PATCH v2 2/2] KVM: x86/mmu: include efer.lma in extended mmu role
From: Sean Christopherson
Date: Mon Nov 15 2021 - 19:08:02 EST
On Mon, Nov 15, 2021, Maxim Levitsky wrote:
> When the host is running with normal TDP mmu (EPT/NPT),
> and it is running a nested 32 bit guest, then after a migration,
> the host mmu (aka root_mmu) is first initialized with
> nested guest's IA32_EFER, due to the way userspace restores
> the nested state.
Please try to avoid unnecessary newlines, I find it quite difficult to read as
my eyeballs need to jump around more. E.g. wrapping at 75 chars yields
When the host is running with normal TDP mmu (EPT/NPT), and it is running
a nested 32 bit guest, then after a migration, the host mmu (aka root_mmu)
is first initialized with nested guest's IA32_EFER, due to the way
userspace restores the nested state.
When later, this is corrected on first nested VM exit to the host, when
host EFER is loaded from vmcs12, the root_mmu is not reset, because the
role.base.level in this case, reflects the level of the TDP mmu which is
always 4 (or 5) on EPT, and usually 4 or even 5 on AMD (when we have
64-bit host).
Since most of the paging state is already captured in the extended mmu
role, just add the EFER.LMA there to force that reset.
> When later, this is corrected on first nested VM exit to the host,
> when host EFER is loaded from vmcs12,
> the root_mmu is not reset, because the role.base.level
> in this case, reflects the level of the TDP mmu which is
> always 4 (or 5) on EPT, and usually 4 or even 5 on AMD
> (when we have 64 bit host).
>
> Since most of the paging state is already captured in
> the extended mmu role, just add the EFER.LMA there to
> force that reset.
Similar to patch 1, I'd like to word the changelog to make it very clear that this
fix is _necessary_, not just a hack to fudge around QEMU behavior. I've spent far
too much time deciphering historical KVM changelogs along the lines of "QEMU does
XYZ, change KVM to handle that", and in more than one case the "fix" has been wrong
and/or incomplete.
Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use. When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context. And stuffing guest state
via KVM_SET_SREGS{2} also ensures a full MMU context reset.
However, if KVM_SET_SREGS{2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2. If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigured root_mmu on the next nested
VM-Exit.
And after typing that up, it's probably also worth adding a blurb to call out (and
argue against) the alternative.
Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so would force
KVM to reconfigure the root_mmu in the common case where L1 and L2 have
the same EFER, e.g. are both 64-bit guests.
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 88fce6ab4bbd7..a44b9eb7d4d6d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -364,6 +364,7 @@ union kvm_mmu_extended_role {
> unsigned int cr4_smap:1;
> unsigned int cr4_smep:1;
> unsigned int cr4_la57:1;
> + unsigned int efer_lma:1;
> };
> };
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 354d2ca92df4d..5c4a41697a717 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4682,6 +4682,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
> /* PKEY and LA57 are active iff long mode is active. */
> ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
> ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
> + ext.efer_lma = ____is_efer_lma(regs);
> }
>
> ext.valid = 1;
> --
> 2.26.3
>