Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

From: Alex Williamson
Date: Fri Sep 13 2019 - 14:05:30 EST


On Mon, 2 Sep 2019 15:32:42 +0800
Ben Luo <luoben@xxxxxxxxxxxxxxxxx> wrote:

> å 2019/8/30 äå1:06, Alex Williamson åé:
> > On Fri, 30 Aug 2019 00:58:22 +0800
> > Ben Luo <luoben@xxxxxxxxxxxxxxxxx> wrote:
> >
> >> å 2019/8/28 äå11:55, Alex Williamson åé:
> >>> On Wed, 28 Aug 2019 12:28:04 +0800
> >>> Ben Luo <luoben@xxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>>> currently, if the page is not a tail of compound page, it will be
> >>>> checked twice for the same thing.
> >>>>
> >>>> Signed-off-by: Ben Luo <luoben@xxxxxxxxxxxxxxxxx>
> >>>> ---
> >>>> drivers/vfio/vfio_iommu_type1.c | 3 +--
> >>>> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>>> index 054391f..d0f7346 100644
> >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>>> static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>>> {
> >>>> if (pfn_valid(pfn)) {
> >>>> - bool reserved;
> >>>> struct page *tail = pfn_to_page(pfn);
> >>>> struct page *head = compound_head(tail);
> >>>> - reserved = !!(PageReserved(head));
> >>>> if (head != tail) {
> >>>> + bool reserved = PageReserved(head);
> >>>> /*
> >>>> * "head" is not a dangling pointer
> >>>> * (compound_head takes care of that)
> >>> Thinking more about this, the code here was originally just a copy of
> >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> >>> Should we instead do the same thing here? Thanks,
> >>>
> >>> Alex
> >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> >> a look at that and compose a new patch
> > I'm not sure if the further updates are quite as relevant for vfio, but
> > appreciate your review of them. Thanks,
> >
> > Alex
>
> After studying the related code, my personal understandings are:
>
> kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO
> mapped so that to set
> the proper MTRR TYPE to spte.
>
> is_invalid_reserved_pfn() is used in two scenarios:
> Â Â 1.Âto tell whether a page should be counted against user's mlock
> limits, as the function's name
> implies, all 'invalid' PFNs who are not backed by struct page and those
> reserved pages (including
> zero page and those from NVDIMM DAX) should be excluded.
> 2. to check if we have got a valid and pinned pfn for the vma
> withÂVM_PFNMAP flag.
>
> So, for the zero page and 'RAM' backed PFNs without 'struct page',
> kvm_is_mmio_pfn() should
> return false because they are not MMIO and are cacheable, but
> is_invalid_reserved_pfn() should
> return true since they are truely reserved or invalid and should not be
> counted against user's
> mlock limits.
>
> For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO
> also returns this
> error code to user, seems not support fsdax for now, so there is no
> chance to call into
> is_invalid_reserved_pfn() currently, if fsdax is to be supported, not
> only this function needs to be
> updated, vaddr_get_pfn() also needs some changes.
>
> Now, with the assumption that PFNs of compound pages with reserved bit
> set in head will not be
> passed to is_invalid_reserved_pfn(), we can simplify this function to:
>
> static bool is_invalid_reserved_pfn(unsigned long pfn)
> {
> ÂÂÂÂÂÂÂ if (pfn_valid(pfn))
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return PageReserved(pfn_to_page(pfn));
>
> ÂÂÂÂÂÂÂ return true;
> }
>
> But, I am not sure if the assumption above is true, if not, we still
> need to check the reserved bit of
> head for a tail page as this PATCH v2 does.

I believe what you've described is correct. Andrea, have we missed
anything? Thanks,

Alex


> >
> >>> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> >>> Author: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> >>> Date: Thu Jul 25 03:04:38 2013 +0200
> >>>
> >>> kvm: optimize away THP checks in kvm_is_mmio_pfn()
> >>>
> >>> The checks on PG_reserved in the page structure on head and tail pages
> >>> aren't necessary because split_huge_page wouldn't transfer the
> >>> PG_reserved bit from head to tail anyway.
> >>>
> >>> This was a forward-thinking check done in the case PageReserved was
> >>> set by a driver-owned page mapped in userland with something like
> >>> remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> >>> possible right now). It was meant to be very safe, but it's overkill
> >>> as it's unlikely split_huge_page could ever run without the driver
> >>> noticing and tearing down the hugepage itself.
> >>>
> >>> And if a driver in the future will really want to map a reserved
> >>> hugepage in userland using an huge pmd it should simply take care of
> >>> marking all subpages reserved too to keep KVM safe. This of course
> >>> would require such a hypothetical driver to tear down the huge pmd
> >>> itself and splitting the hugepage itself, instead of relaying on
> >>> split_huge_page, but that sounds very reasonable, especially
> >>> considering split_huge_page wouldn't currently transfer the reserved
> >>> bit anyway.
> >>>
> >>> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> >>> Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx>
> >>>
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index d2836788561e..0fc25aed79a8 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
> >>>
> >>> bool kvm_is_mmio_pfn(pfn_t pfn)
> >>> {
> >>> - if (pfn_valid(pfn)) {
> >>> - int reserved;
> >>> - struct page *tail = pfn_to_page(pfn);
> >>> - struct page *head = compound_trans_head(tail);
> >>> - reserved = PageReserved(head);
> >>> - if (head != tail) {
> >>> - /*
> >>> - * "head" is not a dangling pointer
> >>> - * (compound_trans_head takes care of that)
> >>> - * but the hugepage may have been splitted
> >>> - * from under us (and we may not hold a
> >>> - * reference count on the head page so it can
> >>> - * be reused before we run PageReferenced), so
> >>> - * we've to check PageTail before returning
> >>> - * what we just read.
> >>> - */
> >>> - smp_rmb();
> >>> - if (PageTail(tail))
> >>> - return reserved;
> >>> - }
> >>> - return PageReserved(tail);
> >>> - }
> >>> + if (pfn_valid(pfn))
> >>> + return PageReserved(pfn_to_page(pfn));
> >>>
> >>> return true;
> >>> }