Re: [RFC PATCH v2 19/27] PCI: dwc: ep: Cache MSI outbound iATU mapping
From: Krishna Chaitanya Chundru
Date: Mon Dec 22 2025 - 00:10:22 EST
On 12/8/2025 1:27 PM, Niklas Cassel wrote:
On Sun, Nov 30, 2025 at 01:03:57AM +0900, Koichiro Den wrote:As the host is the one which enables MSI/MSIX, it should be better the controller
dw_pcie_ep_raise_msi_irq() currently programs an outbound iATU windowI don't like that this patch modifies dw_pcie_ep_raise_msi_irq() but does
for the MSI target address on every interrupt and tears it down again
via dw_pcie_ep_unmap_addr().
On systems that heavily use the AXI bridge interface (for example when
the integrated eDMA engine is active), this means the outbound iATU
registers are updated while traffic is in flight. The DesignWare
endpoint spec warns that updating iATU registers in this situation is
not supported, and the behavior is undefined.
Under high MSI and eDMA load this pattern results in occasional bogus
outbound transactions and IOMMU faults such as:
ipmmu-vmsa eed40000.iommu: Unhandled fault: status 0x00001502 iova 0xfe000000
followed by the system becoming unresponsive. This is the actual output
observed on Renesas R-Car S4, with its ipmmu_hc used with PCIe ch0.
There is no need to reprogram the iATU region used for MSI on every
interrupt. The host-provided MSI address is stable while MSI is enabled,
and the endpoint driver already dedicates a scratch buffer for MSI
generation.
Cache the aligned MSI address and map size, program the outbound iATU
once, and keep the window enabled. Subsequent interrupts only perform a
write to the MSI scratch buffer, avoiding dynamic iATU reprogramming in
the hot path and fixing the lockups seen under load.
Signed-off-by: Koichiro Den <den@xxxxxxxxxxxxx>
---
.../pci/controller/dwc/pcie-designware-ep.c | 48 ++++++++++++++++---
drivers/pci/controller/dwc/pcie-designware.h | 5 ++
2 files changed, 47 insertions(+), 6 deletions(-)
not modify dw_pcie_ep_raise_msix_irq()
both functions call dw_pcie_ep_map_addr() before doing the writel(),
so I think they should be treated the same.
I do however understand that it is a bit wasteful to dedicate one
outbound iATU for MSI and one outbound iATU for MSI-X, as the PCI
spec does not allow both of them to be enabled at the same PCI,
see:
6.1.4 MSI and MSI-X Operation § in PCIe 6.0 spec:
"A Function is permitted to implement both MSI and MSI-X,
but system software is prohibited from enabling both at the
same time. If system software enables both at the same time,
the behavior is undefined."
I guess the problem is that some EPF drivers, even if only
one capability can be enabled (MSI/MSI-X), call both
pci_epc_set_msi() and pci_epc_set_msix(), e.g.:
https://github.com/torvalds/linux/blob/v6.18/drivers/pci/endpoint/functions/pci-epf-test.c#L969-L987
To fill in the number of MSI/MSI-X irqs.
While other EPF drivers only call either pci_epc_set_msi() or
pci_epc_set_msix(), depending on the IRQ type that will actually
be used:
https://github.com/torvalds/linux/blob/v6.18/drivers/nvme/target/pci-epf.c#L2247-L2262
I think both versions is okay, just because the number of IRQs
is filled in for both MSI/MSI-X, AFAICT, only one of them will
get enabled.
I guess it might be hard for an EPC driver to know which capability
that is currently enabled, as to enable a capability is only a config
space write by the host side.
driver takes this decision and the EPF driver just sends only raise_irq.
Because technically, host can disable MSI and enable MSIX at runtime also.
In the controller driver, it can check which is enabled and chose b/w MSIX/MSI/Legacy.
- Krishna Chaitanya.
I guess in most real hardware, e.g. a NIC device, you do an
"enable engine"/"stop enginge" type of write to a BAR.
Perhaps we should have similar callbacks in struct pci_epc_ops ?
My thinking is that after "start engine", an EPC driver could read
the MSI and MSI-X capabilities, to see which is enabled.
As it should not be allowed to change between MSI and MSI-X without
doing a "stop engine" first.
Kind regards,
Niklas