Re: [PATCH RFCv1 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices

From: Robin Murphy

Date: Fri Feb 20 2026 - 09:46:14 EST


On 2026-02-20 1:51 pm, Jason Gunthorpe wrote:
On Fri, Feb 20, 2026 at 01:22:49PM +0000, Robin Murphy wrote:

But is that an issue? Until the device has a driver, surely it shouldn't be
expected to send interrupts at all, much less depend on them being received
and understood by Linux? The MSI cookie is only populated once a driver
actually requests some MSI vectors (since it doesn't know what ITS
address(es) may or may not need mapping), so an empty DMA domain is still no
better than a true blocking domain in this regard anyway.

Oh, the issue is the driver_managed_dma flag.

In this mode we do bind a driver but the iommu callbacks at driver
bind are not called anymore because that flag says the driver itself
will call them later.

Things like PCI port driver that never issue DMA at all will set the
flag and never make any calls, while still expecting interrupts to
work.

This is why the other option is to rework this somewhat so these
drivers still make call in to the iommu and can get an interrupt
setup.

Or perhaps we handle BUS_NOTIFY_BIND_DRIVER to manage the switch from BLOCKED to (empty) DMA independently from whether the driver subsequently claims the DMA domain or not? That said, I wouldn't have any particular objection to generalising iommu_use_default_domain() into something like iommu_prepare_default_domain(bool managed) either.

All of this is only for multi-device groups where we want to ignore
some bad grouping with VFIO on old HW without sufficient ACS. Thinking
about it some more I suspect this entire concept has been broken from
day 1 in VFIO on ARM. If the iommu_group has two members, port driver
and a VFIO device then:

The port driver will start first, install the ITS page in the DMA
domain, VFIO will start second an switch the domain to BLOCKED, then
to PAGING, and the ITS mapping used by the port driver will be lost.

And nobody will notice this has happened because the interrupts in the
port driver are only used for RAS IIRC so the net effect is your
system doesn't print AERs anymore.

Indeed VFIO's MSI cookie doesn't inherit any existing mappings from the DMA domain, and that wouldn't work anyway since the IOVAs would almost certainly be different. So we'd have to somehow free any existing AER interrupts before the domain switch, then fully re-request and reprogram them afterwards, in both DMA->UNMANAGED and UNMANAGED->DMA directions. Oof...

Thanks,
Robin.