Re: [PATCH v2 08/25] KVM: x86/mmu: split cpu_mode from mmu_role

From: Sean Christopherson
Date: Tue Mar 08 2022 - 12:36:38 EST


On Mon, Feb 21, 2022, Paolo Bonzini wrote:
> Snapshot the state of the processor registers that govern page walk into
> a new field of struct kvm_mmu. This is a more natural representation
> than having it *mostly* in mmu_role but not exclusively; the delta
> right now is represented in other fields, such as root_level.
>
> The nested MMU now has only the CPU mode; and in fact the new function
> kvm_calc_cpu_mode is analogous to the previous kvm_calc_nested_mmu_role,
> except that it has role.base.direct equal to CR0.PG. It is not clear
> what the code meant by "setting role.base.direct to true to detect bogus
> usage of the nested MMU".

The idea was to trigger fireworks due to a incoherent state (e.g. direct mmu_role with
non-direct hooks) if the nested_mmu was ever used as a "real" MMU (handling faults,
installing SPs/SPTEs, etc...). For a walk-only MMU, "direct" has no meaning and so
rather than arbitrarily leave it '0', I arbitrarily set it '1'.

Maybe this?

The nested MMU now has only the CPU mode; and in fact the new function
kvm_calc_cpu_mode is analogous to the previous kvm_calc_nested_mmu_role,
except that it has role.base.direct equal to CR0.PG. Having "direct"
track CR0.PG has the serendipitious side effect of being an even better
sentinel than arbitrarily setting direct to true for the nested MMU, as
KVM will run afoul of sanity checks for both direct and indirect MMUs if
KVM attempts to use the nested MMU as a "real" MMU, e.g. for page faults.

> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 107 ++++++++++++++++++++------------
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> 3 files changed, 68 insertions(+), 42 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 92855d3984a7..cc268116eb3f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -433,6 +433,7 @@ struct kvm_mmu {
> struct kvm_mmu_page *sp);
> void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
> struct kvm_mmu_root_info root;
> + union kvm_mmu_role cpu_mode;
> union kvm_mmu_role mmu_role;
> u8 root_level;
> u8 shadow_root_level;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 7c835253a330..1af898f0cf87 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -221,7 +221,7 @@ BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA);
> #define BUILD_MMU_ROLE_ACCESSOR(base_or_ext, reg, name) \
> static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu) \
> { \
> - return !!(mmu->mmu_role. base_or_ext . reg##_##name); \
> + return !!(mmu->cpu_mode. base_or_ext . reg##_##name); \
> }
> BUILD_MMU_ROLE_ACCESSOR(ext, cr0, pg);
> BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
> @@ -4680,6 +4680,39 @@ static void paging32_init_context(struct kvm_mmu *context)
> context->direct_map = false;
> }
>
> +static union kvm_mmu_role
> +kvm_calc_cpu_mode(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)

I strongly prefer we avoid putting the return type on a different line unless
absolutely "necessary".

static union kvm_mmu_role kvm_calc_cpu_mode(struct kvm_vcpu *vcpu,
const struct kvm_mmu_role_regs *regs)

> +{
> + union kvm_mmu_role role = {0};
> +
> + role.base.access = ACC_ALL;
> + role.base.smm = is_smm(vcpu);
> + role.base.guest_mode = is_guest_mode(vcpu);
> + role.base.direct = !____is_cr0_pg(regs);
> + if (!role.base.direct) {

Can we check ____is_cr0_pg() instead of "direct"? IMO that's more intuitive for
understanding why the bits below are left zero. I was scratching my head trying
to figure out whether or not this was safe/correct for direct MMUs...

And this indentation is quite nasty, and will only get worse. An early return or
a goto would solve that nicely. I think I have a slight preference for an early
return?

role.ext.valid = 1;

if (!____is_cr0_pg(regs))
return role;

role.base.efer_nx = ____is_efer_nx(regs);
role.base.cr0_wp = ____is_cr0_wp(regs);
role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
role.base.level = role_regs_to_root_level(regs);

role.ext.cr0_pg = 1;
role.ext.cr4_pae = ____is_cr4_pae(regs);
role.ext.cr4_smep = ____is_cr4_smep(regs);
role.ext.cr4_smap = ____is_cr4_smap(regs);
role.ext.cr4_pse = ____is_cr4_pse(regs);

/* PKEY and LA57 are active iff long mode is active. */
role.ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
role.ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
role.ext.efer_lma = ____is_efer_lma(regs);

return role;