Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE_SIZE size
From: Alex Williamson
Date: Wed Oct 28 2015 - 13:28:54 EST
On Wed, 2015-10-28 at 17:14 +0000, Will Deacon wrote:
> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
> > On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index 57d8c37..13fb974 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> > > static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> > > {
> > > struct vfio_domain *domain;
> > > - unsigned long bitmap = PAGE_MASK;
> > > + unsigned long bitmap = ULONG_MAX;
> >
> > Isn't this and removing the WARN_ON()s the only real change in this
> > patch? The rest looks like conversion to use IS_ALIGNED and the
> > following test, that I don't really understand...
> >
> > >
> > > mutex_lock(&iommu->lock);
> > > list_for_each_entry(domain, &iommu->domain_list, next)
> > > @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
> > > static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> > > struct vfio_iommu_type1_dma_unmap *unmap)
> > > {
> > > - uint64_t mask;
> > > struct vfio_dma *dma;
> > > size_t unmapped = 0;
> > > int ret = 0;
> > > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
> > > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
> > > + PAGE_SIZE : min_pagesz;
> >
> > This one. If we're going to support sub-PAGE_SIZE mappings, why do we
> > care to cap alignment at PAGE_SIZE?
>
> Eric can clarify, but I think the intention here is to have VFIO continue
> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
> all of the pinning code etc. The IOMMU API can then deal with the smaller
> page size.
Gak, I read this wrong. So really we're just artificially adding
PAGE_SIZE as a supported IOMMU size so long as the IOMMU support
something smaller than PAGE_SIZE, where PAGE_SIZE is obviously a
multiple of that smaller size. Ok, but should we just do this once in
vfio_pgsize_bitmap()? This is exactly why VT-d just reports ~(4k - 1)
for the iommu bitmap.
> > > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > > -
> > > - if (unmap->iova & mask)
> > > + if (!IS_ALIGNED(unmap->iova, requested_alignment))
> > > return -EINVAL;
> > > - if (!unmap->size || unmap->size & mask)
> > > + if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
> > > return -EINVAL;
> > >
> > > - WARN_ON(mask & PAGE_MASK);
> > > -
> > > mutex_lock(&iommu->lock);
> > >
> > > /*
> > > @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > > size_t size = map->size;
> > > long npage;
> > > int ret = 0, prot = 0;
> > > - uint64_t mask;
> > > struct vfio_dma *dma;
> > > unsigned long pfn;
> > > + unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
> > > + unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
> > > + PAGE_SIZE : min_pagesz;
> > >
> > > /* Verify that none of our __u64 fields overflow */
> > > if (map->size != size || map->vaddr != vaddr || map->iova != iova)
> > > return -EINVAL;
> > >
> > > - mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
> > > -
> > > - WARN_ON(mask & PAGE_MASK);
> > > -
> > > /* READ/WRITE from device perspective */
> > > if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
> > > prot |= IOMMU_WRITE;
> > > if (map->flags & VFIO_DMA_MAP_FLAG_READ)
> > > prot |= IOMMU_READ;
> > >
> > > - if (!prot || !size || (size | iova | vaddr) & mask)
> > > + if (!prot || !size ||
> > > + !IS_ALIGNED(size | iova | vaddr, requested_alignment))
> > > return -EINVAL;
> > >
> > > /* Don't allow IOVA or virtual address wrap */
> >
> > This is mostly ignoring the problems with sub-PAGE_SIZE mappings. For
> > instance, we can only pin on PAGE_SIZE and therefore we only do
> > accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
> > page, that page gets pinned and accounted 16 times. Are we going to
> > tell users that their locked memory limit needs to be 16x now? The rest
> > of the code would need an audit as well to see what other sub-page bugs
> > might be hiding. Thanks,
>
> I don't see that. The pinning all happens the same in VFIO, which can
> then happily pass a 64k region to iommu_map. iommu_map will then call
> ->map in 4k chunks on the IOMMU driver ops.
Yep, I see now that this isn't doing sub-page mappings. Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/