Regression Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms

From: Jon Kohler

Date: Fri Apr 03 2026 - 12:49:03 EST




> On Jan 2, 2026, at 10:34 AM, Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.oss.qualcomm.com@xxxxxxxxxx> wrote:
>
> From: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxxxxxxxx>
>
> For enabling ACS without the cmdline params, the platform drivers are
> expected to call pci_request_acs() API which sets a static flag,
> 'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
> in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
> this call stack:
>
> -> pci_device_add()
> -> pci_init_capabilities()
> -> pci_acs_init()
> /* check for pci_acs_enable */
> -> pci_enable_acs()
>
> For the OF platforms, pci_request_acs() is called during
> of_iommu_configure() during device_add(), as per this call stack:
>
> -> device_add()
> -> iommu_bus_notifier()
> -> iommu_probe_device()
> -> pci_dma_configure()
> -> of_dma_configure()
> -> of_iommu_configure()
> /* set pci_acs_enable */
> -> pci_request_acs()
>
> As seen from both call stacks, pci_enable_acs() is called way before the
> invocation of pci_request_acs() for the OF platforms. This means,
> pci_enable_acs() will not enable ACS for the first device that gets
> enumerated, which is usally the Root Port device. But since the static
> flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
> ACS capable devices enumerated later.
>
> To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
> only from pci_dma_configure() after calling of_dma_configure(). This makes
> sure that pci_enable_acs() only gets called after the IOMMU framework has
> called pci_request_acs(). The ACS enablement flow now looks like:
>
> -> pci_device_add()
> -> pci_init_capabilities()
> /* Just store the ACS cap */
> -> pci_acs_init()
> -> device_add()
> ...
> -> pci_dma_configure()
> -> of_dma_configure()
> -> pci_request_acs()
> -> pci_enable_acs()
>
> For the ACPI platforms, pci_request_acs() is called during ACPI
> initialization time itself, independent of the IOMMU framework.
>
> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Tested-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxxxxxxxx>
> ---
> drivers/pci/pci-driver.c | 8 ++++++++
> drivers/pci/pci.c | 8 --------
> drivers/pci/pci.h | 1 +
> 3 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 7c2d9d596258..301a9418e38e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1650,6 +1650,14 @@ static int pci_dma_configure(struct device *dev)
> ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
> }
>
> + /*
> + * Attempt to enable ACS regardless of capability because some Root
> + * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> + * the standard ACS capability but still support ACS via those
> + * quirks.
> + */
> + pci_enable_acs(to_pci_dev(dev));
> +
> pci_put_host_bridge_device(bridge);
>
> /* @drv may not be valid when we're called from the IOMMU layer */
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..2c3d0a2d6973 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3648,14 +3648,6 @@ bool pci_acs_path_enabled(struct pci_dev *start,
> void pci_acs_init(struct pci_dev *dev)
> {
> dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> -
> - /*
> - * Attempt to enable ACS regardless of capability because some Root
> - * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> - * the standard ACS capability but still support ACS via those
> - * quirks.
> - */
> - pci_enable_acs(dev);
> }
>
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0e67014aa001..4592ede0ebcc 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -939,6 +939,7 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
> }
>
> void pci_acs_init(struct pci_dev *dev);
> +void pci_enable_acs(struct pci_dev *dev);
> #ifdef CONFIG_PCI_QUIRKS
> int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
> int pci_dev_specific_enable_acs(struct pci_dev *dev);
>
> --
> 2.48.1
>

Howdy folks,
Writing to report a regression from this patch. While attempting to
rebase our 6.12.y-based internal tree past 6.12.75, we encountered a
behavior change in our internal systems, wherein we have a platform
that uses vfio-pci passthrough for a series of PCIe devices (SAS HBAs,
NVMe SSDs, NIC passthrough, etc) to a given service VM.

On some, but not all, of these systems, we noticed that the iommu
group topology completely changes with this commit. Before this commit,
a given system may have ~90-ish iommu groups. After this commit, we see
~60-ish iommu groups.

The net result is that some of the devices marked for passthrough get
clubbed together with devices that are not marked for passthrough. As
such, libvirt refuses to start the VM, like so:

[root@demo-system-here ~]# virsh start demo-service-vm
error: Failed to start domain 'demo-service-vm'
2026-04-02T03:52:17.098998Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:41:00.0","id":"ua-cb173399-0c46-5b7a-b3e6-fe2fb5f9509c","bus":"pci.0","addr":"0x7","rombar":0}: vfio 0000:41:00.0: group 8 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

[root@demo-system-here ~]# lspci |grep 41:00
...
41:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
Subsystem: Super Micro Computer Inc AOC-S3816L-L16iT (NI22) Storage Adapter
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas

[root@demo-system-here ~]# ls -l /sys/kernel/iommu_groups/8/devices
total 0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.0 -> ../../../../devices/pci0000:40/0000:40:01.0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.1 -> ../../../../devices/pci0000:40/0000:40:01.1
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.2 -> ../../../../devices/pci0000:40/0000:40:01.2
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:41:00.0 -> ../../../../devices/pci0000:40/0000:40:01.1/0000:41:00.0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:42:00.0 -> ../../../../devices/pci0000:40/0000:40:01.2/0000:42:00.0

The tricky bit here is 41 is the SAS controller we're trying to pass
through, and 42 is the local NVMe m.2 boot disk that the server itself
is booting off of.

Before this patch, they are put in separate groups, and there is not
a problem.

I’ve only tried this on our 6.12.y tree, but not yet our 6.18 and 6.6
trees, so I’m not sure if this problem exists there yet, but this
commit is in those trees in general in 6.18.16 and 6.6.128 respectively

Happy to provide any other details you might like, as this is 100%
reproducible on a variety of systems here.

Thanks,
Jon