Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes

From: Ackerley Tng

Date: Wed May 20 2026 - 17:44:35 EST

Fuad Tabba <tabba@xxxxxxxxxx> writes:

>
> [...snip...]
>
>> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>> +{
>> + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
>> + struct inode *inode;
>> +
>> + /*
>> + * If this gfn has no associated memslot, there's no chance of the gfn
>> + * being backed by private memory, since guest_memfd must be used for
>> + * private memory, and guest_memfd must be associated with some memslot.
>> + */
>> + if (!slot)
>> + return 0;
>> +
>> + CLASS(gmem_get_file, file)(slot);
>> + if (!file)
>> + return 0;
>> +
>> + inode = file_inode(file);
>> +
>> + /*
>> + * Rely on the maple tree's internal RCU lock to ensure a
>> + * stable result. This result can become stale as soon as the
>> + * lock is dropped, so the caller _must_ still protect
>> + * consumption of private vs. shared by checking
>> + * mmu_invalidate_retry_gfn() under mmu_lock to serialize
>> + * against ongoing attribute updates.
>> + */
>> + return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
>> +}
>
> Doesn't this imply that all consumers of kvm_mem_is_private() should
> validate the result using mmu_lock and the invalidation sequence?

Let me know how I can improve the comment.

I think the "consumption" of private vs shared here actually means
something like "don't commit a page being faulted into page tables based
on the result of kvm_gmem_get_memory_attributes() without checking
kvm->mmu_invalidate_in_progress.", since a racing conversion may
complete before you commit.

kvm_mem_is_private() is used from these places:

1. Fault handling in KVM, like page_fault_can_be_fast(),
kvm_mmu_faultin_pfn(), kvm_mmu_page_fault(): this already handles the
entire mmu_lock and invalidation dance. No fault will be committed if
a racing conversion happened after kvm_mem_is_private() but before
the commit.

2. kvm_mmu_max_mapping_level() from recovering huge pages after
disabling dirty logging: Other than that it can't be used with
guest_memfd now since dirty logging can't be used with guest_memfd
and guest_memfd memslots are not updatable, this holds mmu_lock
throughout until the huge page recovery is done. invalidate_begin
also involves zapping the pages in the range, so if the order of
events is

| Thread A | Thread B |
|------------------------------|-------------------|
| invalidate_begin + zap | |
| update attributes maple_tree | recover huge page |
| invalidate_end | |

Then recovering will never see the zapped pages, nothing to
recover, no kvm_mem_is_private() lookup.

3. kvm_arch_vcpu_pre_fault_memory()

This eventually calls kvm_tdp_mmu_page_fault(), which checks
is_page_fault_stale(), so it does check before committing.

Were there any other calls I missed?

> sev_handle_rmp_fault() calls kvm_mem_is_private() without holding
> mmu_lock and without any retry mechanism. Is that a problem?
>

Sean already replied on your actual question separately :)

> Cheers,
> /fuad
>
>
>>
>> [...snip...]
>>