Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

From: Peter Xu
Date: Sat May 23 2020 - 20:02:10 EST


(CC Andrea too)

On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> On Sat, May 23, 2020 at 05:06:02PM -0600, Alex Williamson wrote:
> > On Sat, 23 May 2020 15:34:17 -0400
> > Peter Xu <peterx@xxxxxxxxxx> wrote:
> >
> > > Hi, Alex,
> > >
> > > On Fri, May 22, 2020 at 01:17:43PM -0600, Alex Williamson wrote:
> > > > @@ -1346,15 +1526,32 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> > > > {
> > > > struct vm_area_struct *vma = vmf->vma;
> > > > struct vfio_pci_device *vdev = vma->vm_private_data;
> > > > + vm_fault_t ret = VM_FAULT_NOPAGE;
> > > > +
> > > > + mutex_lock(&vdev->vma_lock);
> > > > + down_read(&vdev->memory_lock);
> > >
> > > I remembered to have seen the fault() handling FAULT_FLAG_RETRY_NOWAIT at least
> > > in the very first version, but it's not here any more... Could I ask what's
> > > the reason behind? I probably have missed something along with the versions,
> > > I'm just not sure whether e.g. this would potentially block a GUP caller even
> > > if it's with FOLL_NOWAIT.
> >
> > This is largely what v2 was about, from the cover letter:
> >
> > Locking in 3/ is substantially changed to avoid the retry scenario
> > within the fault handler, therefore a caller who does not allow
> > retry will no longer receive a SIGBUS on contention.
> >
> > The discussion thread starts here:
> >
> > https://lore.kernel.org/kvm/20200501234849.GQ26002@xxxxxxxx/
>
> [1]
>
> >
> > Feel free to interject if there's something that doesn't make sense,
> > the idea is that since we've fixed the lock ordering we never need to
> > release one lock to wait for another, therefore we can wait for the
> > lock. I'm under the impression that we can wait for the lock
> > regardless of the flags under these conditions.
>
> I see; thanks for the link. Sorry I should probably follow up the discussion
> and ask the question earlier, anyway...
>
> For what I understand now, IMHO we should still need all those handlings of
> FAULT_FLAG_RETRY_NOWAIT like in the initial version. E.g., IIUC KVM gup will
> try with FOLL_NOWAIT when async is allowed, before the complete slow path. I'm
> not sure what would be the side effect of that if fault() blocked it. E.g.,
> the caller could be in an atomic context.
>
> But now I also agree that VM_FAULT_SIGBUS is probably not correct there in the
> initial version [1] - I thought it was OK initially (after all after the
> multiple fault retry series we should always be with FAULT_FLAG_ALLOW_RETRY..).
> However after some thinking... it should be the common slow path where retry is
> simply not allowed. So IMHO instead of SIGBUS there, we should also use all
> the slow path of the locks. That'll be safe then because it's never going to
> be with FAULT_FLAG_RETRY_NOWAIT (FAULT_FLAG_RETRY_NOWAIT depends on
> FAULT_FLAG_ALLOW_RETRY).
>
> A reference code could be __lock_page_or_retry() where the lock_page could wait
> just like we taking the sems/mutexes, and the previous SIGBUS case would
> corresponds to this chunk of __lock_page_or_retry():
>
> } else {
> if (flags & FAULT_FLAG_KILLABLE) {
> int ret;
>
> ret = __lock_page_killable(page);
> if (ret) {
> up_read(&mm->mmap_sem);
> return 0;
> }
> } else
> __lock_page(page);
> return 1;
> }
>
> Thanks,
>
> --
> Peter Xu

--
Peter Xu