Re: [PATCH] vfio/pci: take mmap write lock for io_remap_pfn_range

From: Alex Williamson
Date: Wed May 22 2024 - 13:50:23 EST


Hey Drew,

On Wed, 22 May 2024 18:56:29 +0200
Andrew Jones <ajones@xxxxxxxxxxxxxxxx> wrote:

> On Mon, May 08, 2023 at 08:58:42PM GMT, Yan Zhao wrote:
> > In VFIO type1, vaddr_get_pfns() will try fault in MMIO PFNs after
> > pin_user_pages_remote() returns -EFAULT.
> >
> > follow_fault_pfn
> > fixup_user_fault
> > handle_mm_fault
> > handle_mm_fault
> > do_fault
> > do_shared_fault
> > do_fault
> > __do_fault
> > vfio_pci_mmap_fault
> > io_remap_pfn_range
> > remap_pfn_range
> > track_pfn_remap
> > vm_flags_set ==> mmap_assert_write_locked(vma->vm_mm)
> > remap_pfn_range_notrack
> > vm_flags_set ==> mmap_assert_write_locked(vma->vm_mm)
> >
> > As io_remap_pfn_range() will call vm_flags_set() to update vm_flags [1],
> > holding of mmap write lock is required.
> > So, update vfio_pci_mmap_fault() to drop mmap read lock and take mmap
> > write lock.
> >
> > [1] https://lkml.kernel.org/r/20230126193752.297968-3-surenb@xxxxxxxxxx
> > commit bc292ab00f6c ("mm: introduce vma->vm_flags wrapper functions")
> > commit 1c71222e5f23
> > ("mm: replace vma->vm_flags direct modifications with modifier calls")
> >
>
> With linux-next I started noticing traces similar to the above without
> lockdep, since it has ba168b52bf8e ("mm: use rwsem assertion macros for
> mmap_lock"). Were there any follow ups to this? Sorry if my quick
> searching missed it.

We've been working on it in the background, but we're not there yet. I
have a branch[1] that converts to an address space on the vfio device,
making unmap_mapping_range() available to easily zap mappings to the
BARs without all the ugly vma tracking, but that still leaves us with
the problem that the remap_pfn_range() shouldn't be called from the
fault handler resulting in this lockdep warning. We can switch to
vmf_insert_pfn() in the fault handler, but that's a noticeable
performance penalty.

The angle I've been working recently is to try taking advantage of
huge_fault support because we do have vmf_insert_pfn_{pmd,pud}, but
these don't currently support pfnmap pfns. I've been working with
Peter Xu as this aligns with some other work of his. It's working and
resolves the performance issue, especially with a little alignment
tweaking in userspace to take advantage of pud mappings.

I'm not sure if there are any outstanding blockers on Peter's side, but
this seems like a good route from the vfio side. If we're seeing this
now without lockdep, we might need to bite the bullet and take the hit
with vmf_insert_pfn() while the pmd/pud path learn about pfnmaps.
Thanks,

Alex

[1]https://github.com/awilliam/linux-vfio/commits/vfio-address-space/