Re: [RFC v2 07/15] vfio/cxl: expose CXL region to the userspace via a new VFIO device region

From: Jonathan Cameron

Date: Mon Dec 22 2025 - 09:00:15 EST


On Tue, 9 Dec 2025 22:20:11 +0530
mhonap@xxxxxxxxxx wrote:

> From: Manish Honap <mhonap@xxxxxxxxxx>
>
> To directly access the device memory, a CXL region is required. Creating
> a CXL region requires to configure HDM decoders on the path to map the
> access of HPA level by level and evetually hit the DPA in the CXL
> topology.
>
> For the userspace, e.g. QEMU, to access the CXL region, the region is
> required to be exposed via VFIO interfaces.
>
> Introduce a new VFIO device region and region ops to expose the created
> CXL region when initialize the device in the vfio-cxl-core. Introduce a
> new sub region type for the userspace to identify a CXL region.
>
> Co-developed-by: Zhi Wang <zhiw@xxxxxxxxxx>
> Signed-off-by: Zhi Wang <zhiw@xxxxxxxxxx>
> Signed-off-by: Manish Honap <mhonap@xxxxxxxxxx>
A few really minor things inline.

> ---
> drivers/vfio/pci/vfio_cxl_core.c | 122 +++++++++++++++++++++++++++++++
> drivers/vfio/pci/vfio_pci_core.c | 3 +-
> include/linux/vfio_pci_core.h | 5 ++
> include/uapi/linux/vfio.h | 4 +
> 4 files changed, 133 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/vfio_cxl_core.c b/drivers/vfio/pci/vfio_cxl_core.c
> index cf53720c0cb7..35d95de47fa8 100644
> --- a/drivers/vfio/pci/vfio_cxl_core.c
> +++ b/drivers/vfio/pci/vfio_cxl_core.c
> @@ -231,6 +231,128 @@ void vfio_cxl_core_destroy_cxl_region(struct vfio_cxl_core_device *cxl)
> }
> EXPORT_SYMBOL_GPL(vfio_cxl_core_destroy_cxl_region);
>
> +static int vfio_cxl_region_mmap(struct vfio_pci_core_device *pci,
> + struct vfio_pci_region *region,
> + struct vm_area_struct *vma)
> +{
> + struct vfio_cxl_region *cxl_region = region->data;
> + u64 req_len, pgoff, req_start, end;
> + int ret;
> +
> + if (!(region->flags & VFIO_REGION_INFO_FLAG_MMAP))
> + return -EINVAL;
> +
> + if (!(region->flags & VFIO_REGION_INFO_FLAG_READ) &&
> + (vma->vm_flags & VM_READ))
> + return -EPERM;
> +
> + if (!(region->flags & VFIO_REGION_INFO_FLAG_WRITE) &&
> + (vma->vm_flags & VM_WRITE))
> + return -EPERM;
> +
> + pgoff = vma->vm_pgoff &
> + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);

GENMASK() might be slightly easier to read and makes it really obvious
this is a simple masking operation.

> +
> + if (check_sub_overflow(vma->vm_end, vma->vm_start, &req_len) ||
> + check_add_overflow(PHYS_PFN(cxl_region->addr), pgoff, &req_start) ||
> + check_add_overflow(PFN_PHYS(pgoff), req_len, &end))
> + return -EOVERFLOW;
> +
> + if (end > cxl_region->size)
> + return -EINVAL;
> +
> + if (cxl_region->noncached)
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> + vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
> +
> + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
> + VM_DONTEXPAND | VM_DONTDUMP);
> +
> + ret = remap_pfn_range(vma, vma->vm_start, req_start,
> + req_len, vma->vm_page_prot);
> + if (ret)
> + return ret;
> +
> + vma->vm_pgoff = req_start;
> +
> + return 0;
> +}
> +
> +static ssize_t vfio_cxl_region_rw(struct vfio_pci_core_device *core_dev,
> + char __user *buf, size_t count, loff_t *ppos,
> + bool iswrite)
> +{
> + unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
> + struct vfio_cxl_region *cxl_region = core_dev->region[i].data;
> + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> +
> + if (!count)
> + return 0;
> +
> + return vfio_pci_core_do_io_rw(core_dev, false,
> + cxl_region->vaddr,
> + (char __user *)buf, pos, count,

buf is already a char __user * so not sure why you'd need a cast here.

> + 0, 0, iswrite);
> +}