Re: I got an IOMMU IO page fault. What to do now?

From: Robin Murphy
Date: Wed Oct 27 2021 - 13:19:10 EST


On 27/10/2021 5:45 pm, Paul Menzel wrote:
Dear Robin,


On 25.10.21 18:01, Robin Murphy wrote:
On 2021-10-25 12:23, Christian König wrote:

not sure how the IOMMU gives out addresses, but the printed ones look suspicious to me. Something like we are using an invalid address like -1 or similar.

FWIW those look like believable DMA addresses to me, assuming that the DMA mapping APIs are being backed iommu_dma_ops and the device has a 40-bit DMA mask, since the IOVA allocator works top-down.

Likely causes are either a race where the dma_unmap_*() call happens before the hardware has really stopped accessing the relevant addresses, or the device's DMA mask has been set larger than it should be, and thus the upper bits have been truncated in the round-trip through the hardware.

Given the addresses involved, my suspicions would initially lean towards the latter case - the faults are in the very topmost pages which imply they're the first things mapped in that range. The other contributing factor being the trick that the IOVA allocator plays for PCI devices, where it tries to prefer 32-bit addresses. Thus you're only likely to see this happen once you already have ~3.5-4GB of live DMA-mapped memory to exhaust the 32-bit IOVA space (minus some reserved areas) and start allocating from the full DMA mask. You should be able to check that with a 5.13 or newer kernel by booting with "iommu.forcedac=1" and seeing if it breaks immediately (unfortunately with an older kernel you'd have to manually hack iommu_dma_alloc_iova() to the same effect).

I booted Linux 5.15-rc7 with `iommu.forcedac=1` and the system booted, and I could log in remotely over SSH. Please find the Linux kernel messages attached. (The system logs say lightdm failed to start, but it might be some other issue due to a change in the operating system.)

OK, that looks like it's made the GPU blow up straight away, which is what I was hoping for (and also appears to reveal another bug where it's not handling probe failure very well - possibly trying to remove a non-existent audio device?). Lightdm presumably fails to start because it doesn't find any display devices, since amdgpu failed to probe.

If you can boot the same kernel without "iommu.forcedac" and get a successful probe and working display, that will imply that it is managing to work OK with 32-bit DMA addresses, at which point I'd have to leave it to Christian and Alex to figure out exactly where DMA addresses are getting mangled. The only thing that stands out to me is the reference to "gfx_v6_0", which makes me wonder whether it's related to gmc_v6_0_sw_init() where a 44-bit DMA mask gets set. If so, that would suggest that either this particular model of GPU is more limited than expected, or that SoC only has 40 bits of address wired up between the PCI host bridge and the IOMMU.

Cheers,
Robin.