RE: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets
From: Manish Honap
Date: Tue Mar 17 2026 - 10:53:13 EST
> -----Original Message-----
> From: Dan Williams <dan.j.williams@xxxxxxxxx>
> Sent: 11 March 2026 07:15
> To: Alex Williamson <alex@xxxxxxxxxxx>; Dan Williams
> <dan.j.williams@xxxxxxxxx>
> Cc: alex@xxxxxxxxxxx; Srirangan Madhavan <smadhavan@xxxxxxxxxx>;
> bhelgaas@xxxxxxxxxx; dave.jiang@xxxxxxxxx; jonathan.cameron@xxxxxxxxxx;
> ira.weiny@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; alison.schofield@xxxxxxxxx;
> dave@xxxxxxxxxxxx; Jeshua Smith <jeshuas@xxxxxxxxxx>; Vikram Sethi
> <vsethi@xxxxxxxxxx>; Sai Yashwanth Reddy Kancherla
> <skancherla@xxxxxxxxxx>; Vishal Aslot <vaslot@xxxxxxxxxx>; Shanker
> Donthineni <sdonthineni@xxxxxxxxxx>; Manish Honap <mhonap@xxxxxxxxxx>;
> Vidya Sagar <vidyas@xxxxxxxxxx>; Jiandi An <jan@xxxxxxxxxx>; Matt Ochs
> <mochs@xxxxxxxxxx>; Derek Schumacher <dschumacher@xxxxxxxxxx>; linux-
> cxl@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state
> across resets
>
> External email: Use caution opening links or attachments
>
>
> Alex Williamson wrote:
> [..]
> > A constraint here is that CXL_BUS can be modular while PCI is builtin,
> > but reset is initiated through PCI and drivers like vfio-pci already
> > manage an opaque blob of PCI device state that can be pushed back into
> > the device to restore it between use cases. If PCI is not enlightened
> > about CXL state to some extent, how does this work?
>
> My expectation is that "vfio-cxl" is responsible. Similar to vfio-pci that
> builds on PCI core functionality for assigning a device, a vfio-cxl driver
> would build on CXL.
>
> Specficially, register generic 'struct cxl_memdev' and/or, 'struct
> cxl_cachdev' objects with the CXL core, like any other accelerator driver,
> and coordinate various levels of reset on those objects, not the 'struct
> pci_dev'.
>
> > PCI core has already been enlightened about things like
> > virtual-channel that it doesn't otherwise touch in order to be able to
> > save and restore firmware initiated configurations. I think there are
> > aspects of that sort of thing here as well. Thanks,
>
> I am willing to hear more and admit I am not familiar with the details of
> virtual-channel that make it both amenable to vfio-pci management and
> similar to CXL. CXL needs to consider MM, cross-device dependencies, and
> decoder topology management that is more dynamic than what happens for PCI
> resources.
>
> The CXL accelerator series is currently contending with being able to
> restore device configuration after reset. I expect vfio-cxl to build on
> that, not push CXL flows into the PCI core.
Hello Dan,
My VFIO CXL Type-2 passthrough series [1] takes a position on this that I
would like to explain because I expect you will have similar concerns about
it and I'd rather have this conversation now.
Type-2 passthrough series takes the opposite structural approach as you are
suggesting here: CXL Type-2 support is an optional extension compiled into
vfio-pci-core (CONFIG_VFIO_CXL_CORE), not a separate driver.
Here is the reasoning:
1. Device enumeration
=====================
CXL Type-2 devices (GPU + accelerator class) are enumerated as struct pci_dev
objects. The kernel discovers them through PCI config space scan, not through
the CXL bus. The CXL capability is advertised via the DVSEC (PCI_EXT_CAP_ID
0x23, Vendor ID 0x1E98), which is PCI config space. There is no CXL bus
device to bind to.
A standalone vfio-cxl driver would therefore need to match on the PCI device
just like vfio-pci does, and then call into vfio-pci-core for every PCI
concern: config space emulation, BAR region handling, MSI/MSI-X, INTx, DMA
mapping, FLR, and migration callbacks. That is the variant driver pattern
we rejected in favour of generic CXL passthrough. We have seen this exact
outcome with the prior iterations of this series before we moved to the
enlightened vfio-pci model.
2. CXL-CORE involvement
=======================
CXL type-2 passthrough series does not bypass CXL core. At vfio_pci_probe()
time the CXL enlightenment layer:
- calls cxl_get_hdm_info() to probe the HDM Decoder Capability block,
- calls cxl_get_committed_decoder() to locate pre-committed firmware regions,
- calls cxl_create_region() / cxl_request_dpa() for dynamic allocation,
- creates a struct cxl_memdev via the CXL core (via cxl_probe_component_regs,
the same path Alejandro's v23 series uses).
The CXL core is fully involved. The difference is that the binding to
userspace is still through vfio-pci, which already manages the pci_dev
lifecycle, reset sequencing, and VFIO region/irq API.
3. Standalone vfio-cxl
======================
To match the model you are suggesting, vfio-cxl would need to:
(a) Register a new driver on the CXL bus (struct cxl_driver), probing
struct cxl_memdev or a new struct cxl_endpoint,
(b) Re-implement or delegate everything vfio-pci-core provides — config
space, BAR regions, IRQs, DMA, FLR, and VFIO container management —
either by calling vfio-pci-core as a library or by duplicating it, and
(c) present to userspace through a new device model distinct from
vfio-pci.
This is a significant new surface. QEMU's CXL passthrough support already
builds on vfio-pci: it receives the PCI device via VFIO, reads the
VFIO_DEVICE_INFO_CAP_CXL capability chain, and exposes the CXL topology.
A vfio-cxl object model would require non-trivial QEMU changes for something
that already works in the enlightened vfio-pci model.
4. Module dependency
====================
Current solution: CONFIG_VFIO_CXL_CORE depends on CONFIG_CXL_BUS. We do not
add CXL knowledge to the PCI core; we add it to the VFIO layer that is already
CXL_BUS-dependent.
I would very much appreciate your thoughts on [1] considering the above. I want
to understand your thoughts on whether vfio-pci-core can remain the single
entry point from userspace, or whether you envision a new VFIO device type.
Jonathan has indicated he has thoughts on this as well; hopefully, we
can converge on a direction that doesn't require duplicating vfio-pci-core.
[1] https://lore.kernel.org/linux-cxl/20260311203440.752648-1-mhonap@xxxxxxxxxx/