Re: [RFC PATCH v2 32/51] KVM: guest_memfd: Support guestmem_hugetlb as custom allocator

From: Sean Christopherson
Date: Fri Oct 03 2025 - 10:52:23 EST


Trimmed the Cc to KVM/guest_memfd folks.

On Wed, May 14, 2025, Ackerley Tng wrote:
> @@ -22,6 +25,10 @@ struct kvm_gmem_inode_private {
> #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> struct maple_tree shareability;
> #endif
> +#ifdef CONFIG_KVM_GMEM_HUGETLB
> + const struct guestmem_allocator_operations *allocator_ops;
> + void *allocator_private;
> +#endif

This is beyond silly. There is _one_ "custom" allocator, and no evidence that
the "generic" logic written for custom allocators would actually be correct for
anything other than a hugetlb allocator.

And to my point about guestmem_hugetlb.c not actually needing access to "private"
mm/ state, I would much rather add e.g. virt/kvm/guest_memfd_hugetlb.{c.h}, and
in the header define:

struct gmem_hugetlb_inode {
struct hstate *h;
struct hugepage_subpool *spool;
struct hugetlb_cgroup *h_cg_rsvd;
};

and then in guest_memfd.c have

struct gmem_inode {
struct shared_policy policy;
struct inode vfs_inode;

u64 flags;
struct maple_tree attributes;

#ifdef CONFIG_KVM_GUEST_MEMFD_HUGETLB
struct gmem_hugetlb_inode hugetlb;
#endif
};

or maybe even better, avoid even that "jump" and define "struct gmem_inode" in a
new virt/kvm/guest_memfd.h:

struct gmem_inode {
struct shared_policy policy;
struct inode vfs_inode;

u64 flags;
struct maple_tree attributes;

#ifdef CONFIG_KVM_GUEST_MEMFD_HUGETLB
struct hstate *h;
struct hugepage_subpool *spool;
struct hugetlb_cgroup *h_cg_rsvd;
#endif
};


The setup code can them be:

#ifdef CONFIG_KVM_GUEST_MEMFD_HUGETLB
if (flags & GUEST_MEMFD_FLAG_HUGETLB) {
err = kvm_gmem_init_hugetlb(inode, size, huge_page_size_log2)
if (err)
goto out;
}
#endif


Actually, if we're at all clever, the #ifdefs can go away completely so long as
kvm_gmem_init_hugetlb() is uncondtionally _declared_, because we rely on dead
code elimination to drop the call before linking.

> };
> +/**
> + * kvm_gmem_truncate_indices() - Truncates all folios beginning @index for
> + * @nr_pages.
> + *
> + * @mapping: filemap to truncate pages from.
> + * @index: the index in the filemap to begin truncation.
> + * @nr_pages: number of PAGE_SIZE pages to truncate.
> + *
> + * Return: the number of PAGE_SIZE pages that were actually truncated.
> + */

Do not add kerneldoc comments for internal helpers. They inevitably become stale
and are a source of friction for developers. The _only_ non-obvious thing here
is the return value.

> +static long kvm_gmem_truncate_indices(struct address_space *mapping,
> + pgoff_t index, size_t nr_pages)
> +{
> + struct folio_batch fbatch;
> + long truncated;
> + pgoff_t last;
> +
> + last = index + nr_pages - 1;
> +
> + truncated = 0;
> + folio_batch_init(&fbatch);
> + while (filemap_get_folios(mapping, &index, last, &fbatch)) {
> + unsigned int i;
> +
> + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> + struct folio *f = fbatch.folios[i];
> +
> + truncated += folio_nr_pages(f);
> + folio_lock(f);
> + truncate_inode_folio(f->mapping, f);
> + folio_unlock(f);
> + }
> +
> + folio_batch_release(&fbatch);
> + cond_resched();
> + }
> +
> + return truncated;
> +}