Re: [PATCH v2 15/15] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU

From: Paolo Bonzini
Date: Fri Jun 07 2024 - 05:32:03 EST


> -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
> - int *root_level)
> +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
> + enum kvm_tdp_mmu_root_types root_type)
> {
> - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa);
> + struct kvm_mmu_page *root = tdp_mmu_get_root(vcpu, root_type);

I think this function should take the struct kvm_mmu_page * directly.

> +{
> + *root_level = vcpu->arch.mmu->root_role.level;
> +
> + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, KVM_DIRECT_ROOTS);

Here you pass root_to_sp(vcpu->arch.mmu->root.hpa);

> +int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa,
> + kvm_pfn_t *pfn)
> +{
> + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte;
> + int leaf;
> +
> + lockdep_assert_held(&vcpu->kvm->mmu_lock);
> +
> + rcu_read_lock();
> + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, KVM_MIRROR_ROOTS);

and likewise here.

You might also consider using a kvm_mmu_root_info for the mirror root,
even though the pgd field is not used.

Then __kvm_tdp_mmu_get_walk() can take a struct kvm_mmu_root_info * instead.

kvm_tdp_mmu_get_walk_mirror_pfn() doesn't belong in this series, but
introducing __kvm_tdp_mmu_get_walk() can stay here.

Looking at the later patch, which uses
kvm_tdp_mmu_get_walk_mirror_pfn(), I think this function is a bit
overkill. I'll do a pre-review of the init_mem_region function,
especially the usage of kvm_gmem_populate:

+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ if (!kvm_slot_can_be_private(slot) || !kvm_mem_is_private(kvm, gfn)) {
+ ret = -EFAULT;
+ goto out_put_page;
+ }

The slots_lock is taken, so checking kvm_slot_can_be_private is unnecessary.

Checking kvm_mem_is_private perhaps should also be done in
kvm_gmem_populate() itself. I'll send a patch.

+ read_lock(&kvm->mmu_lock);
+
+ ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
+ if (ret < 0)
+ goto out;
+ if (ret > PG_LEVEL_4K) {
+ ret = -EINVAL;
+ goto out;
+ }
+ if (mmu_pfn != pfn) {
+ ret = -EAGAIN;
+ goto out;
+ }

If you require pre-faulting, you don't need to return mmu_pfn and
things would be seriously wrong if the two didn't match, wouldn't
they? You are executing with the filemap_invalidate_lock() taken, and
therefore cannot race with kvm_gmem_punch_hole(). (Soapbox mode on:
this is the advantage of putting the locking/looping logic in common
code, kvm_gmem_populate() in this case).

Therefore, I think kvm_tdp_mmu_get_walk_mirror_pfn() can be replaced just with

int kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa)
{
struct kvm *kvm = vcpu->kvm
bool is_direct = !kvm_has_mirrored_tdp(...) || (gpa & kvm->arch.direct_mask);
hpa_t root = is_direct ? ... : ...;

lockdep_assert_held(&vcpu->kvm->mmu_lock);
rcu_read_lock();
leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root));
rcu_read_unlock();
if (leaf < 0)
return false;

spte = sptes[leaf];
return is_shadow_present_pte(spte) && is_last_spte(spte, leaf);
}
EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped);

+ while (region.nr_pages) {
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }

Checking signal_pending() should be done in kvm_gmem_populate() -
again, I'll take care of that. The rest of the loop is not necessary -
just call kvm_gmem_populate() with the whole range and enjoy. You get
a nice API that is consistent with the intended KVM_PREFAULT_MEMORY
ioctl, because kvm_gmem_populate() returns the number of pages it has
processed and you can use that to massage and copy back the struct
kvm_tdx_init_mem_region.

+ arg = (struct tdx_gmem_post_populate_arg) {
+ .vcpu = vcpu,
+ .flags = cmd->flags,
+ };
+ gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa),
+ u64_to_user_ptr(region.source_addr),
+ 1, tdx_gmem_post_populate, &arg);

Paolo