Re: [PATCH v11 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

From: Alex Williamson
Date: Tue Nov 08 2016 - 12:46:29 EST


On Tue, 8 Nov 2016 21:56:29 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:

> On 11/8/2016 5:15 AM, Alex Williamson wrote:
> > On Sat, 5 Nov 2016 02:40:45 +0530
> > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> >
> ...
> >>
> >> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> >
> > Is the expectation here that this is a generic notifier for all
> > vfio->mdev signaling? That should probably be made clear in the mdev
> > API to avoid vendor drivers assuming their notifier callback only
> > occurs for unmaps, even if that's currently the case.
> >
>
> Ok. Adding comment about notifier callback in mdev_device which is part
> of next patch.
>
> ...
>
> >> mutex_lock(&iommu->lock);
> >>
> >> - if (!iommu->external_domain) {
> >> + /* Fail if notifier list is empty */
> >> + if ((!iommu->external_domain) || (!iommu->notifier.head)) {
> >> ret = -EINVAL;
> >> goto pin_done;
> >> }
> >> @@ -867,6 +870,11 @@ unlock:
> >> /* Report how much was unmapped */
> >> unmap->size = unmapped;
> >>
> >> + if (unmapped && iommu->external_domain)
> >> + blocking_notifier_call_chain(&iommu->notifier,
> >> + VFIO_IOMMU_NOTIFY_DMA_UNMAP,
> >> + unmap);
> >
> > This is after the fact, there's already a gap here where pages are
> > unpinned and the mdev device is still running.
>
> Oh, there is a bug here, now unpin_pages() take user_pfn as argument and
> find vfio_dma. If its not found, it doesn't unpin pages. We have to call
> this notifier before vfio_remove_dma(). But if we call this before
> vfio_remove_dma() there will be deadlock since iommu->lock is already
> held here and vfio_iommu_type1_unpin_pages() will also try to hold
> iommu->lock.
> If we want to call blocking_notifier_call_chain() before
> vfio_remove_dma(), sequence should be:
>
> unmapped += dma->size;
> mutex_unlock(&iommu->lock);
> if (iommu->external_domain)) {
> struct vfio_iommu_type1_dma_unmap nb_unmap;
>
> nb_unmap.iova = dma->iova;
> nb_unmap.size = dma->size;
> blocking_notifier_call_chain(&iommu->notifier,
> VFIO_IOMMU_NOTIFY_DMA_UNMAP,
> &nb_unmap);
> }
> mutex_lock(&iommu->lock);
> vfio_remove_dma(iommu, dma);

It seems like it would be worthwhile to have the rb-tree rooted in the
vfio-dma, then we only need to call the notifier if there are pages
pinned within that vfio-dma (ie. the rb-tree is not empty). We can
then release the lock call the notifier, re-acquire the lock, and
BUG_ON if the rb-tree still is not empty. We might get duplicate pfns
between separate vfio_dma structs, but as I mentioned in other replies,
that seems like an exception that we don't need to optimize for.

> > The notifier needs to
> > happen prior to that and I suspect that we need to validate that we
> > have no remaining external pfn references within this vfio_dma block.
> > It seems like we need to root our pfn tracking in the vfio_dma so that
> > we can see that it's empty after the notifier chain and BUG_ON if not.
>
> There is no way to find pfns from that iova range with current
> implementation. We can have this validate if we go with linear array of
> iova to track pfns.

Right, I was still hoping to avoid storing the pfn even with the
array/page-table approach though, ask the mm layer for the mapping
again. Is that too much overhead? Maybe the page table could store
the phys addr and we could use PAGE_MASK to store the reference count
so that each entry is still only 8bytes(?)

> > I would also add some enforcement that external pinning is only enabled
> > when vfio_iommu_type1 is configured for v2 semantics (ie. we only
> > support unmaps exactly matching previous maps).
> >
>
> Ok I'll add that check.
>
> Thanks,
> Kirti