Re: [REGRESSION] Re: imx8 PCI regression since "iommu: Get DT/ACPI parsing into the proper probe path"
From: Robin Murphy
Date: Fri Jan 16 2026 - 12:24:52 EST
On 2026-01-16 4:52 pm, Nicolas Cavallari wrote:
+cc regressions ML
Le 13/01/2026 à 10:17, Nicolas Cavallari a écrit :
+cc patch author & reviewers
On 1/9/26 17:22, Nicolas Cavallari wrote:
When upgrading from 6.12 to a 6.18 kernel, I noticed that a PCI
Ethernet adapter (Microchip LAN7430) would hang under load and not
recover. When that happens, some of its registers indicate it is
failing to do DMA reads, so cannot reclaim entries on its ring buffer.
I bisected the problem into this commit:
commit bcb81ac6ae3c2ef95b44e7b54c3c9522364a245c
Author: Robin Murphy <robin.murphy@xxxxxxx>
Date: Fri Feb 28 15:46:33 2025 +0000
iommu: Get DT/ACPI parsing into the proper probe path
The problem still exists on 6.19-rc1, on pci/next (29a77b4897f1) and on
iommu/master (360e85353769) trees. Reverting the commit fixes the issue.
The problem persists on 6.19-rc5
The system is a Gateworks GW7200, which is a i.MX 8 Mini connected to a
Pericom
PI7C9X2G404 4-port switch connected to the LAN7430 chip.
-[0000:00]---00.0-[01-ff]----00.0-[02-05]--+-01.0-[03]----00.0
+-02.0-[04]--
\-03.0-[05]----00.0
The problem only occurs when there is at least another PCI device in use
on the
switch. It does not happen if the LAN7430 is the only PCI device, or if
the
other devices are not actively used. For example i can reproduce it
with an
ath9k wireless network adapter when it is up and running, but not when
it is
down or its driver is not loaded.
I suspect that other PCI devices have similar issues, but the LAN7430 is
the
easiest one to diagnose, as it hangs within seconds with an iperf3 --
bidir -u
-b 200M and its register map are public.
I couldn't find an way to dump the PCI address translation mapping from
userspace.
I would be happy to provide more information or test patches.
I debugged it further, it seems to be mostly a PCI issue since the system does not actually have an IOMMU.
Indeed, I was figuring this had to be another case of a switch with wonky ACS - do Mani's patches adjusting ACS enablement make any difference?
https://lore.kernel.org/all/20260102-pci_acs-v3-1-72280b94d288@xxxxxxxxxxxxxxxx/
Although in this case I guess the issue is arguably more that we're requesting ACS at all, before we know that there's actually an IOMMU present to warrant it. Clearly the best option would be to figure out if the switch behaviour itself can be fixed somehow, but perhaps something like this might help paper over the issue for now (but I'd have to test it to make sure it doesn't break IOMMUs again...)
----->8-----
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 6b989a62def2..837cc0b5ace4 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -141,10 +141,12 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
.np = master_np,
};
- pci_request_acs();
err = pci_for_each_dma_alias(to_pci_dev(dev),
of_pci_iommu_init, &info);
- of_pci_check_device_ats(dev, master_np);
+ if (!err) {
+ pci_request_acs();
+ of_pci_check_device_ats(dev, master_np);
+ }
} else {
err = of_iommu_configure_device(master_np, dev, id);
}
-----8<-----
Thanks,
Robin.