Re: [PATCH v7 2/8] KVM: Introduce __kvm_follow_pfn function

From: Sean Christopherson
Date: Fri Aug 04 2023 - 18:04:04 EST


On Thu, Jul 06, 2023, Yu Zhang wrote:
> On Thu, Jul 06, 2023 at 02:29:24PM +0900, David Stevens wrote:
> > On Wed, Jul 5, 2023 at 7:53 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Jul 05, 2023 at 06:22:59PM +0900, David Stevens wrote:
> > > > On Wed, Jul 5, 2023 at 12:10 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > @@ -2514,35 +2512,26 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
> > > > > > * The slow path to get the pfn of the specified host virtual address,
> > > > > > * 1 indicates success, -errno is returned if error is detected.
> > > > > > */
> > > > > > -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
> > > > > > - bool interruptible, bool *writable, kvm_pfn_t *pfn)
> > > > > > +static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn)
> > > > > > {
> > > > > > - unsigned int flags = FOLL_HWPOISON;
> > > > > > + unsigned int flags = FOLL_HWPOISON | FOLL_GET | foll->flags;
> > > > > > struct page *page;
> > > > > > int npages;
> > > > > >
> > > > > > might_sleep();
> > > > > >
> > > > > > - if (writable)
> > > > > > - *writable = write_fault;
> > > > > > -
> > > > > > - if (write_fault)
> > > > > > - flags |= FOLL_WRITE;
> > > > > > - if (async)
> > > > > > - flags |= FOLL_NOWAIT;
> > > > > > - if (interruptible)
> > > > > > - flags |= FOLL_INTERRUPTIBLE;
> > > > > > -
> > > > > > - npages = get_user_pages_unlocked(addr, 1, &page, flags);
> > > > > > + npages = get_user_pages_unlocked(foll->hva, 1, &page, flags);
> > > > > > if (npages != 1)
> > > > > > return npages;
> > > > > >
> > > > > > + foll->writable = (foll->flags & FOLL_WRITE) && foll->allow_write_mapping;
> > > > > > +
> > > > > > /* map read fault as writable if possible */
> > > > > > - if (unlikely(!write_fault) && writable) {
> > > > > > + if (unlikely(!foll->writable) && foll->allow_write_mapping) {
> > > > >
> > > > > I guess !foll->writable should be !(foll->flags & FOLL_WRITE) here.
> > > >
> > > > The two statements are logically equivalent, although I guess using
> > > > !(foll->flags & FOLL_WRITE) may be a little clearer, if a little more
> > > > verbose.
> > >
> > > Well, as the comment says, we wanna try to map the read fault as writable
> > > whenever possible. And __gfn_to_pfn_memslot() will only set the FOLL_WRITE
> > > for write faults. So I guess using !foll->writable will not allow this.
> > > Did I miss anything?
> >
> > We just set the foll->writable out parameter to be equal to
> > ((foll->flags & FOLL_WRITE) && foll->allow_write_mapping). Taking a =
> > foll->flags & FOLL_WRITE and b = foll->allow_write_mapping, we have
> > !(a && b) && b -> (!a || !b) && b -> (!a && b) || (!b && b) -> !a &&
> > b.
>
> Ouch, my bad again... I typed "!foll->writable", but missed the "!" in
> my head while calculating... Thanks! :)

The code is funky and confusing though. Specifically, FOLL_WRITE without
allow_write_mapping is nonsensical, and yields the even more nonsensical output
of a successful FOLL_WRITE with foll->writable==%false.

It "works" because callers only consume foll->writable when foll->allow_write_mapping
is true, but relying on that is ugly and completely unnecessary. Similarly, the
"allow" terminology is misleading. FOLL_WRITE *always* allows writable mappings.

This wasn't as much of problem in the previous code because the lower levels took
the pointer, i.e. avoided the "allow" terminology entirely.

So we should either keep that behavior, i.e. replace "bool allow_write_mapping"
with "bool *writable", or rename allow_write_mapping to something like
opportunistically_map_writable, and then unconditionally set foll->writable
whenever KVM obtains a writable mapping, i.e. regardless of whether the original
fault was a read or a write.

My vote is for the latter. If opportunistically_map_writable is too verbose,
try_map_writable would be another option. Hmm, I'll make "try_map_writable" my
official vote.

Ah, and I also vote to use an if-elif instead of unconditionally setting foll->writable.
That makes the relationship between FOLL_WRITE and try_map_writable a bit more
obvious IMO. E.g.

static bool hva_to_pfn_fast(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn)
{
struct page *page[1];

/*
* Fast pin a writable pfn only if it is a write fault request
* or the caller allows to map a writable pfn for a read fault
* request.
*/
if (!((foll->flags & FOLL_WRITE) || foll->try_map_writable))
return false;

if (get_user_page_fast_only(foll->hva, FOLL_WRITE, page)) {
*pfn = page_to_pfn(page[0]);
foll->writable = true;
return true;
}

return false;
}

/*
* The slow path to get the pfn of the specified host virtual address,
* 1 indicates success, -errno is returned if error is detected.
*/
static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn)
{
unsigned int flags = FOLL_HWPOISON | FOLL_GET | foll->flags;
struct page *page;
int npages;

might_sleep();

npages = get_user_pages_unlocked(foll->hva, 1, &page, flags);
if (npages != 1)
return npages;

if (foll->flags & FOLL_WRITE) {
foll->writable = true;
} else if (foll->try_map_writable) {
struct page *wpage;

/* map read fault as writable if possible */
if (get_user_page_fast_only(foll->hva, FOLL_WRITE, &wpage)) {
foll->writable = true;
put_page(page);
page = wpage;
}
}
*pfn = page_to_pfn(page);
return npages;
}