Re: [PATCH] vfio/type1: Unpin zero pages

From: Alex Williamson
Date: Tue Aug 30 2022 - 11:11:28 EST


On Tue, 30 Aug 2022 09:59:33 +0200
David Hildenbrand <david@xxxxxxxxxx> wrote:

> On 30.08.22 05:05, Alex Williamson wrote:
> > There's currently a reference count leak on the zero page. We increment
> > the reference via pin_user_pages_remote(), but the page is later handled
> > as an invalid/reserved page, therefore it's not accounted against the
> > user and not unpinned by our put_pfn().
> >
> > Introducing special zero page handling in put_pfn() would resolve the
> > leak, but without accounting of the zero page, a single user could
> > still create enough mappings to generate a reference count overflow.
> >
> > The zero page is always resident, so for our purposes there's no reason
> > to keep it pinned. Therefore, add a loop to walk pages returned from
> > pin_user_pages_remote() and unpin any zero pages.
> >
> > Cc: David Hildenbrand <david@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
> > Reported-by: Luboslav Pivarc <lpivarc@xxxxxxxxxx>
> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index db516c90a977..8706482665d1 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
> > ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
> > pages, NULL, NULL);
> > if (ret > 0) {
> > + int i;
> > +
> > + /*
> > + * The zero page is always resident, we don't need to pin it
> > + * and it falls into our invalid/reserved test so we don't
> > + * unpin in put_pfn(). Unpin all zero pages in the batch here.
> > + */
> > + for (i = 0 ; i < ret; i++) {
> > + if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
> > + unpin_user_page(pages[i]);
> > + }
> > +
> > *pfn = page_to_pfn(pages[0]);
> > goto done;
> > }
> >
> >
>
> As discussed offline, for the shared zeropage (that's not even
> refcounted when mapped into a process), this makes perfect sense to me.
>
> Good question raised by Sean if ZONE_DEVICE pages might similarly be
> problematic. But for them, we cannot simply always unpin here.

What sort of VM mapping would give me ZONE_DEVICE pages? Thanks,

Alex