Re: [RFC v2 07/15] vfio/cxl: expose CXL region to the userspace via a new VFIO device region

From: Dave Jiang

Date: Thu Dec 11 2025 - 13:02:00 EST




On 12/11/25 10:31 AM, Manish Honap wrote:
>
>
>> -----Original Message-----
>> From: Dave Jiang <dave.jiang@xxxxxxxxx>
>> Sent: 11 December 2025 21:36
>> To: Manish Honap <mhonap@xxxxxxxxxx>; Aniket Agashe
>> <aniketa@xxxxxxxxxx>; Ankit Agrawal <ankita@xxxxxxxxxx>; Alex Williamson
>> <alwilliamson@xxxxxxxxxx>; Vikram Sethi <vsethi@xxxxxxxxxx>; Jason
>> Gunthorpe <jgg@xxxxxxxxxx>; Matt Ochs <mochs@xxxxxxxxxx>; Shameer
>> Kolothum <skolothumtho@xxxxxxxxxx>; alejandro.lucero-palau@xxxxxxx;
>> dave@xxxxxxxxxxxx; jonathan.cameron@xxxxxxxxxx;
>> alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx;
>> ira.weiny@xxxxxxxxx; dan.j.williams@xxxxxxxxx; jgg@xxxxxxxx; Yishai
>> Hadas <yishaih@xxxxxxxxxx>; kevin.tian@xxxxxxxxx
>> Cc: Neo Jia <cjia@xxxxxxxxxx>; Kirti Wankhede <kwankhede@xxxxxxxxxx>;
>> Tarun Gupta (SW-GPU) <targupta@xxxxxxxxxx>; Zhi Wang <zhiw@xxxxxxxxxx>;
>> Krishnakant Jaju <kjaju@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
>> linux-cxl@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx
>> Subject: Re: [RFC v2 07/15] vfio/cxl: expose CXL region to the userspace
>> via a new VFIO device region
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 12/9/25 9:50 AM, mhonap@xxxxxxxxxx wrote:
>>> From: Manish Honap <mhonap@xxxxxxxxxx>
>>>
>>> To directly access the device memory, a CXL region is required.
>>> Creating a CXL region requires to configure HDM decoders on the path
>>> to map the access of HPA level by level and evetually hit the DPA in
>>> the CXL topology.
>>>
>>> For the userspace, e.g. QEMU, to access the CXL region, the region is
>>> required to be exposed via VFIO interfaces.
>>>
>>> Introduce a new VFIO device region and region ops to expose the
>>> created CXL region when initialize the device in the vfio-cxl-core.
>>> Introduce a new sub region type for the userspace to identify a CXL
>> region.
>>>
>>> Co-developed-by: Zhi Wang <zhiw@xxxxxxxxxx>
>>> Signed-off-by: Zhi Wang <zhiw@xxxxxxxxxx>
>>> Signed-off-by: Manish Honap <mhonap@xxxxxxxxxx>
>>> ---
>>> drivers/vfio/pci/vfio_cxl_core.c | 122
>> +++++++++++++++++++++++++++++++
>>> drivers/vfio/pci/vfio_pci_core.c | 3 +-
>>> include/linux/vfio_pci_core.h | 5 ++
>>> include/uapi/linux/vfio.h | 4 +
>>> 4 files changed, 133 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_cxl_core.c
>>> b/drivers/vfio/pci/vfio_cxl_core.c
>>> index cf53720c0cb7..35d95de47fa8 100644
>>> --- a/drivers/vfio/pci/vfio_cxl_core.c
>>> +++ b/drivers/vfio/pci/vfio_cxl_core.c
>>> @@ -231,6 +231,128 @@ void vfio_cxl_core_destroy_cxl_region(struct
>>> vfio_cxl_core_device *cxl) }
>>> EXPORT_SYMBOL_GPL(vfio_cxl_core_destroy_cxl_region);
>>>
>>> +static int vfio_cxl_region_mmap(struct vfio_pci_core_device *pci,
>>> + struct vfio_pci_region *region,
>>> + struct vm_area_struct *vma) {
>>> + struct vfio_cxl_region *cxl_region = region->data;
>>> + u64 req_len, pgoff, req_start, end;
>>> + int ret;
>>> +
>>> + if (!(region->flags & VFIO_REGION_INFO_FLAG_MMAP))
>>> + return -EINVAL;
>>> +
>>> + if (!(region->flags & VFIO_REGION_INFO_FLAG_READ) &&
>>> + (vma->vm_flags & VM_READ))
>>> + return -EPERM;
>>> +
>>> + if (!(region->flags & VFIO_REGION_INFO_FLAG_WRITE) &&
>>> + (vma->vm_flags & VM_WRITE))
>>> + return -EPERM;
>>> +
>>> + pgoff = vma->vm_pgoff &
>>> + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>>> +
>>> + if (check_sub_overflow(vma->vm_end, vma->vm_start, &req_len) ||
>>> + check_add_overflow(PHYS_PFN(cxl_region->addr), pgoff,
>> &req_start) ||
>>> + check_add_overflow(PFN_PHYS(pgoff), req_len, &end))
>>> + return -EOVERFLOW;
>>> +
>>> + if (end > cxl_region->size)
>>> + return -EINVAL;
>>> +
>>> + if (cxl_region->noncached)
>>> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>>> + vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
>>> +
>>> + vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
>>> + VM_DONTEXPAND | VM_DONTDUMP);
>>> +
>>> + ret = remap_pfn_range(vma, vma->vm_start, req_start,
>>> + req_len, vma->vm_page_prot);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + vma->vm_pgoff = req_start;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static ssize_t vfio_cxl_region_rw(struct vfio_pci_core_device
>> *core_dev,
>>> + char __user *buf, size_t count, loff_t
>> *ppos,
>>> + bool iswrite) {
>>> + unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) -
>> VFIO_PCI_NUM_REGIONS;
>>> + struct vfio_cxl_region *cxl_region = core_dev->region[i].data;
>>> + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
>>> +
>>> + if (!count)
>>> + return 0;
>>> +
>>> + return vfio_pci_core_do_io_rw(core_dev, false,
>>> + cxl_region->vaddr,
>>> + (char __user *)buf, pos, count,
>>> + 0, 0, iswrite); }
>>> +
>>> +static void vfio_cxl_region_release(struct vfio_pci_core_device
>> *vdev,
>>> + struct vfio_pci_region *region) { }
>>> +
>>> +static const struct vfio_pci_regops vfio_cxl_regops = {
>>> + .rw = vfio_cxl_region_rw,
>>> + .mmap = vfio_cxl_region_mmap,
>>> + .release = vfio_cxl_region_release,
>>> +};
>>> +
>>> +int vfio_cxl_core_register_cxl_region(struct vfio_cxl_core_device
>>> +*cxl) {
>>> + struct vfio_pci_core_device *pci = &cxl->pci_core;
>>> + struct vfio_cxl *cxl_core = cxl->cxl_core;
>>> + u32 flags;
>>> + int ret;
>>> +
>>> + if (WARN_ON(!cxl_core->region.region || cxl_core->region.vaddr))
>>> + return -EEXIST;
>>> +
>>> + cxl_core->region.vaddr = ioremap(cxl_core->region.addr,
>> cxl_core->region.size);
>>> + if (!cxl_core->region.addr)
>>
>> I think you are wanting to check cxl_core->region.vaddr here right?
>
> Yes, you are correct. I will update this check.
>
>>
>> Also, what is the ioremap'd region for?
>
> It is to handle read/write operations when QEMU performs I/O on the VFIO CXL device region via the read()/write() syscalls.

For the CXL device region, for the most part the operations are done via the region being mmap()'d by qemu right? I understand read/write to BAR0 MMIO. What specific operations are done via read/write to the region? It may be worth mentioning in the commit log.

>
>>
>> DJ