Re: iommu: flood of ahci 0000:e6:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0055 address=0xa14a4000 flags=0x0070]

From: Robin Murphy
Date: Wed Feb 05 2025 - 13:54:01 EST


On 2025-02-05 1:36 pm, Corentin Labbe wrote:
Le Mon, Feb 03, 2025 at 01:01:45PM +0000, Robin Murphy a écrit :
On 2025-02-03 9:05 am, Corentin Labbe wrote:
Hello

I have a supermicro server which is flooded of kernel message:
ahci 0000:e6:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0055 address=0xa14a4000 flags=0x0070]

The server works perfectly anyway.
It happens with official ubuntu kernel vmlinuz-6.8.0-51-generic.
I tried also a custom 6.12.6, same problem.

I tried to update bios, no change.
I tried iommu=soft, no change.

I dont know what to do next.

Regards


IOMMU group 83 e6:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller [1b4b:9230] (rev 11)

Wow, a Marvell SATA controller doing something other than the usual
phantom function quirk, that's a nice change :D

I'd guess that firmware has left it running for something like legacy
IDE emulation (if that's still a thing?) or its own soft-RAID driver,
but neglected to declare an IVMD entry to described the reserved memory
region(s) it's using for that. A smoking gun would be if 0xa14a4000
matches some firmware-reserved PA in the system memory map. In that
case, if you're lucky you might have some firmware/BIOS option to
disable fancy behaviour and leave it in plain AHCI mode. Otherwise,
booting with "iommu.passthrough=1" (or the even bigger hammer of
"amd_iommu=off") should at least allow you to ignore the issue.


Hello

Thanks for your help

There was no AHCI option in the BIOS (appart hotplug enable).

Adding iommu.passthrough=1 lead to absence of thoses messages.

Unfortunatly, my example is not correct, the address is mostly random:
dmesg |grep IO_PAGE_FAULT | grep -o 'address=0x[0-9a-f]*' | sort | uniq -c | wc -l
9297

dmesg |grep IO_PAGE_FAULT | grep -o 'address=0x[0-9a-f]*' | sort | uniq -c | head
2 address=0x1101f000
2 address=0x1101f004
3 address=0x1102f000
1 address=0x1102f004
2 address=0x1102f008
2 address=0x1102f010
2 address=0x11043000
2 address=0x11043004
1 address=0x11047000
1 address=0x11047004

dmesg |grep IO_PAGE_FAULT | grep -o 'address=0x[0-9a-f]*' | sort | uniq -c | tail
2 address=0xfffffffffe751004
2 address=0xfffffffffe7e6000
2 address=0xfffffffffe7e6004
4 address=0xfffffffffe823000
3 address=0xfffffffffe823004
2 address=0xfffffffffe830000
2 address=0xfffffffffe830004
3 address=0xfffffffffe833000
1 address=0xfffffffffe833004
1 address=0xfffffffffe833008

OK, these look like iommu-dma addresses, and the fact that they're up into the full 64-bit space implies that the 32-bit ones are most likely also kernel DMA burning through the whole 32-bit IOVA space rather than inadvertent physical address (and possibly the SATA driver is leaking DMA mappings as it keeps getting errors and retrying?). Indeed it seems the firmware stuff probably was a red herring.

I guess that then points to a question of whether it's maybe just the SATA driver going wonky and trying to make the device write to a DMA_TO_DEVICE mapping, or something going awry at the IOMMU to divert the device accesses to a different address space from the one iommu-dma believes it's using...

But the domain/flags are always the same

Full dmesg (without IOMMU messages) https://uk01.z.antigena.com/l/VspdfbZQLwA2gZviRaGoPfE2bAxamMd9VFWOj4n78OuhpCoBo5HcXgWgXfTVvyxW1R3W9GTx4RbHm1MGyqBINkuTrnW31h9eTfLTUvXfcYh-IaTwmSc5kZo_-iU9-qQLbKsIjA9LNxyfbAA2AKGOSws6K4vuOrR6i-DL5DiQW1gHCrhhBMgE0Y7RK2m9

The server is doing qemu GPU passthough via VFIO.
I believe (aka I need to re-verify) that message start whatever qemu starts or not.

Oh, it's certainly not impossible that that getting VFIO involved may tickle some bug or misconfiguration wherein the wrong device ends up inadvertently attached to the wrong domain. I don't know the ins and outs of debugging with the AMD driver, though, so I think this is the point where I have to leave this one to Vasant :)

Thanks,
Robin.