[PATCH 00/20] vfio/pci: Add CXL Type-2 device passthrough support

From: mhonap

Date: Wed Mar 11 2026 - 16:35:33 EST


From: Manish Honap <mhonap@xxxxxxxxxx>

This series adds support for passthrough of CXL Type-2 devices to virtual
machines through VFIO; The goal is to expose CXL functionality through
the generic vfio-pci core, without any need for a variant driver.

Current design is based on CXL core APIs provided by Alejandro's CXL
type-2 device support patch series which is currently in upstream
review. (see drivers/net/ethernet/sfc/efx_cxl.c) [1].

This patchset should be applied on the cxl next branch using the base
specified at the end of this cover letter + Alejandro's v23 mentioned in
[1].

This patch series introduces CONFIG_VFIO_CXL_CORE, a new optional module
source compiled into vfio-pci-core, that hooks into the vfio-pci
open/close and reset paths to provide:

* Automatic CXL Type-2 detection at device open time via the CXL Device
DVSEC capability (Vendor ID 0x1E98, ID 0x0000) and HDM Decoder
Capability block.

* Kernel-owned HDM decoder management. The VMM never programs HDM
decoders directly; instead it reads and writes an emulated shadow copy
of the HDM register block through a dedicated COMP_REGS VFIO region.
All bit-field rules (reserved bits, read-only bits, the
COMMIT/COMMITTED latch) are enforced by the kernel.

* A DPA VFIO region backed by the kernel-assigned Host Physical Address
(HPA). The VMM maps this region with mmap(); PTEs are inserted lazily
on first fault. During FLR/reset all PTEs are invalidated atomically
under memory_lock and re-inserted after the reset path re-enables the
decoder.

* CXL DVSEC configuration-space emulation. Writes to the CXL Control,
Status, Control2, Status2, Lock, and Range Base registers in the
device's PCI extended configuration space are intercepted and replayed
through a per-device shadow (vconfig), enforcing CXL 3.1 register
semantics including the RWL/RW1CS/RWO access types and the CONFIG_LOCK
one-shot latch.

* A new VFIO_DEVICE_INFO_CAP_CXL capability (id=6) returned in the
VFIO_DEVICE_GET_INFO capability chain, carrying all the information a
VMM (e.g. QEMU) needs: HDM decoder count, BAR index and offset of the
component registers, total DPA size, and indices of the two new VFIO
regions.

* Two new VFIO region subtypes under the PCI_VENDOR_ID_CXL vendor
namespace: VFIO_REGION_SUBTYPE_CXL (DPA memory) and
VFIO_REGION_SUBTYPE_CXL_COMP_REGS (emulated HDM registers).

* A module parameter (disable_cxl=1) and a per-device flag
(vdev->disable_cxl) so that the feature can be suppressed for
individual devices or globally without recompiling.

* Comprehensive selftests in tools/testing/selftests/vfio/ covering
device detection, capability parsing, region enumeration, HDM register
emulation, DPA mmap with page-fault insertion, FLR invalidation, and
DVSEC register emulation.

This new design is moved away from variant driver approach and all the
CXL functionality is now made part of vfio-pci driver.

The reasons for this change are:

* Generic CXL Type-2 support features (DVSEC, HDM, regions, reset)
are common to all CXL adapters and don't belong in variant drivers.
When something is vendor-specific (e.g. live migration, proprietary
features), a variant is appropriate; generic CXL behavior should
not require a vendor-specific driver. Generic CXL support belongs
in the core, not behind a variant.

* With this new approach, the user always binds to vfio-pci. No need to
choose or document a CXL-specific or vendor-specific driver for
standard CXL Type-2 passthrough.

* For any CXL Type-2 device, enlightened vfio-pci works with any device
that presents CXL Device DVSEC and the expected component layout.

* CXL detection, state, register emulation, region creation, and reset
live in a CXL-aware layer invoked from the core (optionally built
via CONFIG_VFIO_CXL_CORE). The core stays a single entry point;
CXL is an optional extension, not a separate driver stack.

* Pushing CXL into the pci-core avoids per-device CXL detection and
feature toggling inside vendor-specific drivers.

Series structure
================

* Patches 1-5 extend the CXL subsystem to export the interfaces and
defines that vfio-pci-core needs.

* Patches 6-8 lay the vfio-pci-core plumbing.

* Patches 9-12 implement the core device lifecycle and DPA region.

* Patches 13-15 implement configuration-space and register emulation.

* Patches 16-18 wire everything together.

* Patches 19-20 add documentation and testing.

Limitations and future work
===========================

* This series does not yet support switched topologies with more than one
caching agent; that is planned for a future series.

* RAS / ECC / CCA / Reset Support
This design will integrate RAS and ECC handling in generic vfio-pci by
leveraging CXL core and RAS capabilities in next patch updates.

* cxl_reset support [2]
Integrate changes from Srirangan to have VFIO-CXL reset support.

Dependencies
============

[1] Type2 device basic support https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@xxxxxxx/
[2] CXL Reset support for Type 2 devices https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@xxxxxxxxxx/

Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Ira Weiny <ira.weiny@xxxxxxxxx>
Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
Cc: Alejandro Lucero <alejandro.lucero-palau@xxxxxxx>
Cc: linux-cxl@xxxxxxxxxxxxxxx
Cc: kvm@xxxxxxxxxxxxxxx

Co-developed-by: Zhi Wang <zhiw@xxxxxxxxxx>
Signed-off-by: Zhi Wang <zhiw@xxxxxxxxxx>
Signed-off-by: Manish Honap <mhonap@xxxxxxxxxx>

--

Manish Honap (20):
cxl: Introduce cxl_get_hdm_reg_info()
cxl: Expose cxl subsystem specific functions for vfio
cxl: Move CXL spec defines to public header
cxl: Media ready check refactoring
cxl: Expose BAR index and offset from register map
vfio/cxl: Add UAPI for CXL Type-2 device passthrough
vfio/pci: Add CXL state to vfio_pci_core_device
vfio/pci: Add vfio-cxl Kconfig and build infrastructure
vfio/cxl: Implement CXL device detection and HDM register probing
vfio/cxl: CXL region management
vfio/cxl: Expose DPA memory region to userspace with fault+zap mmap
vfio/pci: Export config access helpers
vfio/cxl: Introduce HDM decoder register emulation framework
vfio/cxl: Check media readiness and create CXL memdev
vfio/cxl: Introduce CXL DVSEC configuration space emulation
vfio/pci: Expose CXL device and region info via VFIO ioctl
vfio/cxl: Provide opt-out for CXL feature
docs: vfio-pci: Document CXL Type-2 device passthrough
selftests/vfio: Add CXL Type-2 passthrough tests
selftests/vfio: Fix VLA initialisation in vfio_pci_irq_set()

Documentation/driver-api/index.rst | 1 +
Documentation/driver-api/vfio-pci-cxl.rst | 216 +++++
drivers/cxl/core/pci.c | 80 +-
drivers/cxl/core/regs.c | 29 +
drivers/cxl/cxl.h | 34 -
drivers/vfio/pci/Kconfig | 2 +
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/cxl/Kconfig | 7 +
drivers/vfio/pci/cxl/vfio_cxl_config.c | 304 +++++++
drivers/vfio/pci/cxl/vfio_cxl_core.c | 713 +++++++++++++++
drivers/vfio/pci/cxl/vfio_cxl_emu.c | 414 +++++++++
drivers/vfio/pci/cxl/vfio_cxl_priv.h | 123 +++
drivers/vfio/pci/vfio_pci.c | 32 +
drivers/vfio/pci/vfio_pci_config.c | 58 +-
drivers/vfio/pci/vfio_pci_core.c | 31 +
drivers/vfio/pci/vfio_pci_priv.h | 72 ++
drivers/vfio/pci/vfio_pci_rdwr.c | 8 +
include/cxl/cxl.h | 52 ++
include/linux/vfio_pci_core.h | 10 +
include/uapi/linux/vfio.h | 52 ++
tools/testing/selftests/vfio/Makefile | 1 +
.../selftests/vfio/lib/vfio_pci_device.c | 4 +-
.../selftests/vfio/vfio_cxl_type2_test.c | 816 ++++++++++++++++++
23 files changed, 3013 insertions(+), 47 deletions(-)
create mode 100644 Documentation/driver-api/vfio-pci-cxl.rst
create mode 100644 drivers/vfio/pci/cxl/Kconfig
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_config.c
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_core.c
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_emu.c
create mode 100644 drivers/vfio/pci/cxl/vfio_cxl_priv.h
create mode 100644 tools/testing/selftests/vfio/vfio_cxl_type2_test.c

base-commit: 3f7938b1aec7f06d5b23adca83e4542fcf027001
--
2.25.1