Re: [PATCH 1/6] KVM: guest_memfd: Add DEFAULT_SHARED flag, reject user page faults if not set

From: Vishal Annapurve
Date: Wed Oct 01 2025 - 12:31:38 EST


On Wed, Oct 1, 2025 at 9:15 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Oct 01, 2025, Vishal Annapurve wrote:
> > On Mon, Sep 29, 2025 at 5:15 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > Oh! This got me looking at kvm_arch_supports_gmem_mmap() and thus
> > > KVM_CAP_GUEST_MEMFD_MMAP. Two things:
> > >
> > > 1. We should change KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS so
> > > that we don't need to add a capability every time a new flag comes along,
> > > and so that userspace can gather all flags in a single ioctl. If gmem ever
> > > supports more than 32 flags, we'll need KVM_CAP_GUEST_MEMFD_FLAGS2, but
> > > that's a non-issue relatively speaking.
> > >
> >
> > Guest_memfd capabilities don't necessarily translate into flags, so ideally:
> > 1) There should be two caps, KVM_CAP_GUEST_MEMFD_FLAGS and
> > KVM_CAP_GUEST_MEMFD_CAPS.
>
> I'm not saying we can't have another GUEST_MEMFD capability or three, all I'm
> saying is that for enumerating what flags can be passed to KVM_CREATE_GUEST_MEMFD,
> KVM_CAP_GUEST_MEMFD_FLAGS is a better fit than a one-off KVM_CAP_GUEST_MEMFD_MMAP.

Ah, ok. Then do you envision the guest_memfd caps to still be separate
KVM caps per guest_memfd feature?

>
> > 2) IMO they should both support namespace of 64 values at least from the get go.
>
> It's a limitation of KVM_CHECK_EXTENSION, and all of KVM's plumbing for ioctls.
> Because KVM still supports 32-bit architectures, direct returns from ioctls are
> forced to fit in 32-bit values to avoid unintentionally creating different ABI
> for 32-bit vs. 64-bit kernels.
>
> We could add KVM_CHECK_EXTENSION2 or KVM_CHECK_EXTENSION64 or something, but I
> honestly don't see the point. The odds of guest_memfd supporting >32 flags is
> small, and the odds of that happening in the next ~5 years is basically zero.
> All so that userspace can make one syscall instead of two for a path that isn't
> remotely performance critical.
>
> So while I agree that being able to enumerate 64 flags from the get-go would be
> nice to have, it's simply not worth the effort (unless someone has a clever idea).

Ack.

>
> > 3) The reservation scheme for upstream should ideally be LSB's first
> > for the new caps/flags.
>
> We're getting way ahead of ourselves. Nothing needs KVM_CAP_GUEST_MEMFD_CAPS at
> this time, so there's nothing to discuss.
>
> > guest_memfd will achieve multiple features in future, both upstream
> > and in out-of-tree versions to deploy features before they make their
>
> When it comes to upstream uAPI and uABI, out-of-tree kernel code is irrelevant.
>
> > way upstream. Generally the scheme followed by out-of-tree versions is
> > to define a custom UAPI that won't conflict with upstream UAPIs in
> > near future. Having a namespace of 32 values gives little space to
> > avoid the conflict, e.g. features like hugetlb support will have to
> > eat up at least 5 bits from the flags [1].
>
> Why on earth would out-of-tree code use KVM_CAP_GUEST_MEMFD_FLAGS? Providing

I can imagine a scenario where KVM_CAP_GUEST_MEMFD_FLAGS is upstreamed
and more flags landing in KVM_CAP_GUEST_MEMFD_FLAGS as supported over
time afterwards. out-of-tree code may ingest KVM_CAP_GUEST_MEMFD_FLAGS
in between.

> infrastructure to support an infinite (quite literally) number of out-of-tree
> capabilities and sub-ioctls, with practically zero chance of conflict, is not
> difficult. See internal b/378111418.
>
> But as above, this is not upstream's problem to solve.
>
> > [1] https://elixir.bootlin.com/linux/v6.17/source/include/uapi/asm-generic/hugetlb_encode.h#L20