Re: [RFC PATCH 10/10] vfio/type1: Register device notifier
From: Peter Xu
Date: Thu Feb 25 2021 - 13:03:57 EST
On Wed, Feb 24, 2021 at 08:22:16PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 24, 2021 at 02:55:08PM -0700, Alex Williamson wrote:
>
> > > > +static bool strict_mmio_maps = true;
> > > > +module_param_named(strict_mmio_maps, strict_mmio_maps, bool, 0644);
> > > > +MODULE_PARM_DESC(strict_mmio_maps,
> > > > + "Restrict to safe DMA mappings of device memory (true).");
> > >
> > > I think this should be a kconfig, historically we've required kconfig
> > > to opt-in to unsafe things that could violate kernel security. Someone
> > > building a secure boot trusted kernel system should not have an
> > > options for userspace to just turn off protections.
> >
> > It could certainly be further protected that this option might not
> > exist based on a Kconfig, but I think we're already risking breaking
> > some existing users and I'd rather allow it with an opt-in (like we
> > already do for lack of interrupt isolation), possibly even with a
> > kernel taint if used, if necessary.
>
> Makes me nervous, security should not be optional.
>
> > > I'd prefer this was written a bit differently, I would like it very
> > > much if this doesn't mis-use follow_pte() by returning pfn outside
> > > the lock.
> > >
> > > vaddr_get_bar_pfn(..)
> > > {
> > > vma = find_vma_intersection(mm, vaddr, vaddr + 1);
> > > if (!vma)
> > > return -ENOENT;
> > > if ((vma->vm_flags & VM_DENYWRITE) && (prot & PROT_WRITE)) // Check me
> > > return -EFAULT;
> > > device = vfio_device_get_from_vma(vma);
> > > if (!device)
> > > return -ENOENT;
> > >
> > > /*
> > > * Now do the same as vfio_pci_mmap_fault() - the vm_pgoff must
> > > * be the physical pfn when using this mechanism. Delete follow_pte entirely()
> > > */
> > > pfn = (vaddr - vma->vm_start)/PAGE_SIZE + vma->vm_pgoff
> > >
> > > /* de-dup device and record that we are using device's pages in the
> > > pfnmap */
> > > ...
> > > }
> >
> >
> > This seems to undo both:
> >
> > 5cbf3264bc71 ("vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()")
>
> No, the bug this commit described is fixed by calling
> vfio_device_get_from_vma() which excludes all non-VFIO VMAs already.
>
> We can assert that the vm_pgoff is in a specific format because it is
> a VFIO owned VMA and must follow the rules to be part of the address
> space. See my last email
>
> Here I was suggesting to use the vm_pgoff == PFN rule, but since
> you've clarified that doesn't work we'd have to determine the PFN from
> the region number through the vfio_device instead.
>
> > (which also suggests we are going to break users without the module
> > option opt-in above)
>
> Not necessarily, this is complaining vfio crashes, it doesn't say they
> actually needed the IOMMU to work on those VMAs because they are doing
> P2P DMA.
>
> I think, if this does break someone, they are on a real fringe and
> must have already modified their kernel, so a kconfig is the right
> approach. It is pretty hard to get non-GUP'able DMA'able memory into a
> process with the stock kernel.
>
> Generally speaking, I think Linus has felt security bug fixes like
> this are more on the OK side of things to break fringe users.
>
> > And:
> >
> > 41311242221e ("vfio/type1: Support faulting PFNMAP vmas")
> >
> > So we'd have an alternate path in the un-safe mode and we'd lose the
> > ability to fault in mappings.
>
> As above we already exclude VMAs that are not from VFIO, and VFIO
> sourced VMA's do not meaningfully implement fault for this use
> case. So calling fixup_user_fault() is pointless.
>
> Peter just did this so we could ask him what it was for..
>
> I feel pretty strongly that removing the call to follow_pte is
> important here. Even if we do cover all the issues with mis-using the
> API it just makes a maintenance problem to leave it in.
I can't say I fully understand the whole rational behind 5cbf3264bc71, but that
commit still sounds reasonable to me, since I don't see why VFIO cannot do
VFIO_IOMMU_MAP_DMA upon another memory range that's neither anonymous memory
nor vfio mapped MMIO range. In those cases, vm_pgoff namespace defined by vfio
may not be true anymore, iiuc.
Then if with that follow_pfn() for non-vfio mappings, it seems also very
reasonable to have 41311242221e or similar as proposed by Alex to make sure pte
installed before calling that, for either vfio or other vma providers.
Or does it mean that we don't want to allow VFIO dma to those unknown memory
backends, for some reason?
Thanks,
--
Peter Xu