Re: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions
From: Jason Gunthorpe
Date: Tue Feb 16 2021 - 16:32:46 EST
On Tue, Feb 16, 2021 at 12:39:56PM -0800, Dan Williams wrote:
> > >> + /*
> > >> + * Check and see if the guest wants to map to the limited or unlimited portal.
> > >> + * The driver will allow mapping to unlimited portal only if the the wq is a
> > >> + * dedicated wq. Otherwise, it goes to limited.
> > >> + */
> > >> + virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1;
> > >> + phys_portal = IDXD_PORTAL_LIMITED;
> > >> + if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq))
> > >> + phys_portal = IDXD_PORTAL_UNLIMITED;
> > >> +
> > >> + /* We always map IMS portals to the guest */
> > >> + pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal,
> > >> + IDXD_IRQ_IMS)) >> PAGE_SHIFT;
> > >> + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size,
> > >> + pgprot_val(pg_prot));
> > >> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > >> + vma->vm_private_data = mdev;
> > > What ensures the mdev pointer is valid strictly longer than the VMA?
> > > This needs refcounting.
> >
> > Going to take a kref at open and then put_device at close. Does that
> > sound reasonable or should I be calling get_device() in mmap() and then
> > register a notifier for when vma is released?
>
> Where does this enabling ever look at vm_private_data again?
So long as a PCI BAR page is mapped into a VMA the pci driver cannot
be removed. Things must either wait until the fd (or at least all
VMAs) are closed, or zap the VMAs before allowing the device driver to
be removed.
There should be some logic in this whole thing where the pci_driver
destroys the mdevs which destroy the vfio's which wait for all the fds
to be closed.
There is enough going on in vfio_device_fops_release() that this might
happen already, Dave needs to investigate and confirm the whole thing
works as expected.
Presumably there is no security issue with sharing these portal pages
because I don't see a vma ops involved here to track when pages are
freed up (ie the vm_private_data is dead code cargo-cult'd from
someplace else)
But this is all sufficiently tricky, and Intel has already had
security bugs in their drivers here, that someone needs to audit it
closely before it gets posted again.
Jason