Re: [PATCH] vfio/pci: take mmap write lock for io_remap_pfn_range

From: Peter Xu
Date: Wed May 22 2024 - 17:21:52 EST


On Wed, May 22, 2024 at 11:50:06AM -0600, Alex Williamson wrote:
> I'm not sure if there are any outstanding blockers on Peter's side, but
> this seems like a good route from the vfio side. If we're seeing this
> now without lockdep, we might need to bite the bullet and take the hit
> with vmf_insert_pfn() while the pmd/pud path learn about pfnmaps.

No immediate blockers, it's just that there're some small details that I
may still need to look into. The current one TBD is pfn tracking
implications on PAT. Here I see at least two issues to be investigated.

Firstly, when vfio zap bars it can try to remove VM_PAT flag. To be
explicit, unmap_single_vma() has:

if (unlikely(vma->vm_flags & VM_PFNMAP))
untrack_pfn(vma, 0, 0, mm_wr_locked);

I believe it'll also erase the entry on the memtype_rbroot.. I'm not sure
whether that's correct at all, and if that's correct how we should
re-inject that. So far I feel like we should keep that pfn tracking stuff
alone from tearing down pgtables only, but I'll need to double check.
E.g. I at least checked MADV_DONTNEED won't allow to apply on PFNMAPs, so
vfio zapping the vma should be the 1st one can do that besides munmap().

The other thing is I just noticed very recently that the PAT bit on x86_64
is not always the same one.. on 4K it's bit 7, but it's reused as PSE on
higher levels, moving PAT to bit 12:

#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */
#define _PAGE_BIT_PAT 7 /* on 4KB pages */
#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */

We may need something like protval_4k_2_large() when injecting huge
mappings.

>From the schedule POV, the plan is I'll continue work on this after I flush
the inbox for the past two weeks and when I'll get some spare time. Now
~160 emails left.. but I'm getting there. If there's comments for either
of above, please shoot.

Thanks,

--
Peter Xu