Re: [PATCH 0/5] PCI/CXL: Save and restore CXL DVSEC and HDM state across resets

From: Dan Williams

Date: Tue Mar 10 2026 - 17:41:28 EST


smadhavan@ wrote:
> From: Srirangan Madhavan <smadhavan@xxxxxxxxxx>
>
> CXL devices could lose their DVSEC configuration and HDM decoder programming
> after multiple reset methods (whenever link disable/enable). This means a
> device that was fully configured — with DVSEC control/range registers set
> and HDM decoders committed — loses that state after reset. In cases where
> these are programmed by firmware, downstream drivers are unable to re-initialize
> the device because CXL memory ranges are no longer mapped.
>
> This series adds CXL state save/restore logic to the PCI core so
>
> that DVSEC and HDM decoder state is preserved across any PCI reset
> path that calls pci_save_state() / pci_restore_state(), for a CXL capable device.

The PCI core has no business learning CXL core internals.

For example, I have been pushing the CXL port protocol error handling
series to minimally involve the PCI core. Just enough enabling to
forward AER events, but otherwise PCI core stays blissfully unaware of
CXL details. The alternative is maintenance burden to the
PCI core that I expect is best to avoid.

> HDM decoder defines and the cxl_register_map infrastructure are moved from
> internal CXL driver headers to a new public include/cxl/pci.h, allowing
> drivers/pci/cxl.c to use them.
> This layout aligns with Alejandro Lucero's CXL Type-2 device series [1] to
> minimize conflicts when both land. When he rebases to 7.0-rc2, I can move my
> changes on top of his.

I think we need to evaluate where things stand after both the CXL port
error handling series and the CXL accelerator base series have landed.
Not that they are functionally dependendent on each other, but there is
a review backlog that needs to clear, and those establish the precedent
about where CXL functionality lands between PCI core, CXL core, and CXL
enlightened drivers.

> These patches were previously part of the CXL reset series and have been
> split out [2] to allow independent review and merging. Review feedback on
> the save/restore portions from v4 has been addressed.
>
> Tested on a CXL Type-2 device. DVSEC and HDM state is correctly saved
> before reset and restored after, with decoder commit confirmed via the
> COMMITTED status bit. Type-3 device testing is in progress.

It is a memory hot plug event.An accelerator driver can coordinate
quiescing CXL.mem over events like reset, a memory expander driver can
not. The PCI core can not manage memory hot plug. It is the wrong place
to enable this specific CXL reset because PCI core has no idea about the
suitability of reset at any given point of time.

Now, the secondary bus reset enabling for the CXL did end up with
changes to the PCI core:

53c49b6e6dd2 PCI/CXL: Add 'cxl_bus' reset method for devices below CXL Ports

...but only to disambiguate that hardware may be blocking secondary bus
reset by default. However, as the cxl_reset_done() handler shows, there
is zero coordination. One might get lucky and be able to see those
dev_crit() messages before the kernel crashes in the memory expander
case.