RE: [PATCH v2 16/20] vfio/cxl: Register regions with VFIO layer

From: Manish Honap

Date: Fri Apr 17 2026 - 10:52:37 EST




> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: 07 April 2026 03:40
> To: Dan Williams <djbw@xxxxxxxxxx>
> Cc: Manish Honap <mhonap@xxxxxxxxxx>; Alex Williamson
> <alwilliamson@xxxxxxxxxx>; jonathan.cameron@xxxxxxxxxx;
> dave.jiang@xxxxxxxxx; alejandro.lucero-palau@xxxxxxx; dave@xxxxxxxxxxxx;
> alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
> dmatlack@xxxxxxxxxx; shuah@xxxxxxxxxx; Yishai Hadas <yishaih@xxxxxxxxxx>;
> Shameer Kolothum Thodi <skolothumtho@xxxxxxxxxx>; kevin.tian@xxxxxxxxx;
> Ankit Agrawal <ankita@xxxxxxxxxx>; Vikram Sethi <vsethi@xxxxxxxxxx>; Neo
> Jia <cjia@xxxxxxxxxx>; Tarun Gupta (SW-GPU) <targupta@xxxxxxxxxx>; Zhi
> Wang <zhiw@xxxxxxxxxx>; Krishnakant Jaju <kjaju@xxxxxxxxxx>; linux-
> kselftest@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> cxl@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 16/20] vfio/cxl: Register regions with VFIO layer
>
> External email: Use caution opening links or attachments
>
>
> On Sat, Apr 04, 2026 at 12:36:53PM -0700, Dan Williams wrote:
>
> > Then I assume it matters that memremap() sometimes silently falls back
> > to the direct map. The "VFIO owns" expectation needs to guard against
> > some helpful platform firmware mapping accelerator memory as System RAM.
>
> I don't think how memremap works under the covers matters to vfio, it
> takes in a phys_addr_t an gives back a KVA that is cachable kernel memory.
>
> We just have to be mindful to not allow mismatched attributes on virtual
> aliases.
>
> > At a minimum having VFIO fail to map in that case helps with the
> > argument I have been making that "no, EFI_CONVENTIONAL_MEMORY type +
> > EFI_SPECIFIC_PURPOSE flag" is not suitable for accelerators with
> > private CXL memory. Those want to be enforcing "EFI_RESERVED".
>
> Certainly it should fail the request region if something else is using it,
> so if those EFI flags plug it into the mm and it is plugged when VFIO
> starts then it should stop.
>
> The direct map isn't really "plugged into the mm" but it does raise the
> mismatched attributed issue.
>
> Jason

Thank you all for your suggestions.

As pointed out by Jason/Gregory, the whole design presented here is
VFIO owned device regions + mmap + vmf_insert_pfn (patch 14) and registration
at vfio_pci_open_device() (patch 16), Lazy fault (vmf_insert_pfn), zapping
PTEs on reset.

I am convinced on the memremap usage as discussed. VFIO owns the physical
range exclusively: no struct pages are involved (vmf_insert_pfn inserts raw
PFNs), and memremap(MEMREMAP_WB) gives the VMA the correct WB cache
attributes.

As suggested, I will add request_mem_region for memory range and a failure
message if we cannot request mem region at probe.
This will also catch the EFI case: if firmware marks accelerator memory as
EFI_CONVENTIONAL_MEMORY, MM will claim it at boot and request_mem_region will
return a failure at probe time itself.

I will also add a comment in next cover letter explaining why we have decided
to use memremap and not ioremap so that the decision we have taken in v2 is
entirely clear at the start of review cycle.

Manish