Re: [PATCH v10 09/15] KVM: guest_memfd: Add flag to remove from direct map

From: Nikita Kalyazin

Date: Fri Mar 06 2026 - 09:59:57 EST

On 06/03/2026 14:22, David Hildenbrand (Arm) wrote:

[...]

+ /*
+ * Direct map restoration cannot fail, as the only error condition
+ * for direct map manipulation is failure to allocate page tables
+ * when splitting huge pages, but this split would have already
+ * happened in folio_zap_direct_map() in
kvm_gmem_folio_zap_direct_map().
+ * Note that the splitting occurs always because guest_memfd
+ * currently supports only base pages.
+ * Thus folio_restore_direct_map() here only updates prot bits.
+ */
+ WARN_ON_ONCE(folio_restore_direct_map(folio));

Which raised the question: why should this function then even return an
error?

Dave pointed earlier that the failures were possible [1]. Do you think
we can document it better?

I'm fine with checking that somewhere (to catch any future problems).

Why not do the WARN_ON_ONCE() in folio_restore_direct_map()?

Then, carefully document (in the new kerneldoc for
folio_restore_direct_map() etc) that folio_restore_direct_map() is only
allowed after a prior successful folio_zap_direct_map(), and add a
helpful comment above the WARN_ON_ONCE() in folio_restore_direct_map()
that we don't expect errors etc.

My only concern about that is the assumptions we make in KVM may not apply to the general case and the WARN_ON_ONCE may become too restrictive compared to proper error handling in some (rare) cases. For example, is it possible for the folio to migrate in between?

[...]

- if (!is_prepared)
+ if (!is_prepared) {
r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
+ if (r)
+ goto out_unlock;
+ }
+
+ if (kvm_gmem_no_direct_map(folio_inode(folio))) {
+ r = kvm_gmem_folio_zap_direct_map(folio);
+ if (r)
+ goto out_unlock;
+ }

It's a bit nasty that we have two different places where we have to call
this. Smells error prone.

We will actually have 2 more: for the write() syscall and UFFDIO_COPY,
and 0 once we have [2]

[2] https://lore.kernel.org/linux-mm/20260225-page_alloc-unmapped-v1-0-
e8808a03cd66@xxxxxxxxxx/

I was wondering why kvm_gmem_get_folio() cannot handle that?

Most of the call sites follow the pattern alloc -> write -> zap so
they'll need direct map for some time after the allocation.

Okay. Nasty. :)

--
Cheers,

David