Re: [RFT][PATCH v1 6/6] vfio: Replace phys_pfn with phys_page for vfio_pin_pages()
From: Nicolin Chen
Date: Tue Jun 21 2022 - 17:47:53 EST
On Mon, Jun 20, 2022 at 12:36:28PM -0300, Jason Gunthorpe wrote:
> On Sun, Jun 19, 2022 at 11:37:47PM -0700, Christoph Hellwig wrote:
> > On Sun, Jun 19, 2022 at 10:51:47PM -0700, Christoph Hellwig wrote:
> > > On Mon, Jun 20, 2022 at 12:00:46AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Jun 17, 2022 at 01:54:05AM -0700, Christoph Hellwig wrote:
> > > > > There is a bunch of code an comments in the iommu type1 code that
> > > > > suggest we can pin memory that is not page backed.
> > > >
> > > > AFAIK you can.. The whole follow_pte() mechanism allows raw PFNs to be
> > > > loaded into the type1 maps and the pin API will happily return
> > > > them. This happens in almost every qemu scenario because PCI MMIO BAR
> > > > memory ends up routed down this path.
> > >
> > > Indeed, my read wasn't deep enough. Which means that we can't change
> > > the ->pin_pages interface to return a struct pages array, as we don't
> > > have one for those.
> >
> > Actually. gvt requires a struct page, and both s390 seem to require
> > normal non-I/O, non-remapped kernel pointers. So I think for the
> > vfio_pin_pages we can assume that we only want page backed memory and
> > remove the follow_fault_pfn case entirely. But we'll probably have
> > to keep it for the vfio_iommu_replay case that is not tied to
> > emulated IOMMU drivers.
>
> Right, that is my thinking - since all drivers actually need a struct
> page we should have the API return a working struct page and have the
> VFIO layer isolate the non-struct page stuff so it never leaks out of
> this API.
>
> This nicely fixes the various problems in all the drivers if io memory
> comes down this path.
>
> It is also why doing too much surgery deeper into type 1 probably
> isn't too worthwhile as it still needs raw pfns in its data
> structures for iommu modes.
Christoph, do you agree with Jason's remark on not doing too much
surgery into type1 code? Or do you still want this series to change
type1 like removing follow_fault_pfn() that you mentioned above?