Re: [PATCH 17/21] filemap: add FGP_CREAT_ONLY

From: Paolo Bonzini
Date: Wed Feb 28 2024 - 15:17:48 EST


On Wed, Feb 28, 2024 at 8:24 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 28, 2024 at 02:28:45PM +0100, Paolo Bonzini wrote:
> > Since you're here: KVM would like to add a ioctl to encrypt and
> > install a page into guest_memfd, in preparation for launching an
> > encrypted guest. For this API we want to rule out the possibility of
> > overwriting a page that is already in the guest_memfd's filemap,
> > therefore this API would pass FGP_CREAT_ONLY|FGP_CREAT
> > into__filemap_get_folio. Do you think this is bogus...
>
> Would it work to start out by either asserting the memfd is empty of
> pages, or by evicting any existing pages? Both those seem nicer than
> starting, realising you've got some unencrypted memory and aborting.

Unfortunately it would be quite ugly to force userspace to do all the
initialization in one go. For example, there are different kinds of
pages that probably would be initialized at different points (e.g.
before vs. after vCPUs are created, because the initial vCPU state is
also encrypted).

The thing that I want to protect against is userspace trying to
initialize the same encrypted page twice.

> > > This looks bogus to me, and if it's not bogus, it's incomplete.
> >
> > ... or if not, what incompleteness can you spot?
>
> The part where we race another caller passing FGP_CREAT_ONLY and one gets
> an EEXIST back from filemap_add_folio(). Maybe that's not something
> that can happen in your use case, but it's at least semantics that
> need documenting.

>From the point of view of filemap_add_folio(), one of the racers wins
and one fails. It doesn't matter to filemap.c if the missing
synchronization is in the kernel or in userspace. In the case of KVM,
the ioctl will return the number of pages before it found an existing
page, or -EEXIST if that number is zero (similar to what nonblocking
read does with EAGAIN).

I'll improve the documentation and changelog and make sure to Cc you
on the next version.

Thanks again!

Paolo