Re: [PATCH] iommu/arm-smmu-v3: Maintain valid access attributes for non-coherent SMMU

From: Robin Murphy
Date: Mon Jan 05 2026 - 11:18:32 EST


On 2026-01-05 2:53 pm, Jason Gunthorpe wrote:
On Mon, Jan 05, 2026 at 01:33:34PM +0000, Robin Murphy wrote:
The assumption is that if the SMMU is not I/O-coherent, then the Normal
Cacheable attribute will inherently degrade to a non-snooping (and thus
effectively Normal Non-Cacheable) one, as that's essentially what AXI will
do in practice, and thus the attribute doesn't actually matter all that much
in terms of functional correctness. If the SMMU _is_ capable of snooping but
is not described as such then frankly firmware is wrong.

Sadly I am aware of people doing this.. Either a HW bug or some other
weird issue forces the FW to set a non-coherent FW attribute even
though the HW is partially or fully able to process cachable AXI
attributes.

What I've seen more often is that firmware authors ignore (or don't even realise) I/O-coherency, then someone ends up trying to "fix" Linux with wacky DMA API patches when things start going wrong on mis-described hardware... Yes, they might happen to get away with it with stuff like SMMUs or Mali GPUs where a "well-behaved" driver might have control of the source attributes, but where AxCACHE/AxDOMAIN are hard-wired there's really no option other than to describe the hardware correctly, hence that should always be the primary consideration.

And as I say, if there *is* some reason to specifically avoid using certain attributes, then deliberately describing the hardware incorrectly, in the hope that OS drivers might be "well-behaved" in a way that just happens to lead to them not using those attributes, is really not a robust workaround anyway.

It is reasonable that Linux will set the attributes properly based on
what it is doing. Setting the wrong attributes and expecting the HW to
ignore them seems like a hacky direction.

Oh, I'm not saying that we *shouldn't* set our attributes more exactly - this would still be needed for doing things the "right" way too - I just want to be very clear on the reasons why. The current assumption is not a bug per se, and although it's indeed not 100% robust, the cases where it doesn't hold are more often than not for the wrong reason. Therefore I would say doing this purely for the sake of working around bad firmware - and especially errata - is just as hacky if not more so.

I didn't see anything in the spec that says COHACC means the memory
attributes are ignored and forced to non-coherent, even though that is
the current assumption of the driver.

It kinda works the other way round: COHACC==1 says that the Cacheability and Shareability attributes *can* be configured to snoop CPU caches (although do not have to be); therefore the "If a system does not support IO-coherent access from the SMMU, SMMU_IDR0.COHACC must be 0" case implies that the SMMU is incapable of snooping regardless of how those attributes are set, thus must at least be in a different Inner Shareability domain from the CPUs, at which point the Cacheability domains probably aren't too meaningful either. I would consider it would be unexpectedly incorrect behaviour for an SMMU reporting COHACC==0 to actually be capable of snooping.

If prople have a good reason for wanting to use a coherent SMMU
non-coherently (and/or control of allocation hints), then that should really
be some kind of driver-level option - it would need a bit of additional DMA
API work (which has been low down my to-do list for a few years now...), but
it's perfectly achievable, and I think it's still preferable to abusing the
COHACC override in firmware.

IMHO, this is a different topic, and something that will probably
become interesting this year. I'm aware of some HW/drivers that needs
optional non-coherent mappings for isochronous flows - but it is not
the DMA API that is the main issue but the page table walks :\

Hmm, yeah, potentially configuring PTW attributes for a DMA domain is something I hadn't even thought about - the DMA API aspect I mean is that in general we need some sort of DMA_ATTR_NO_SNOOP when mapping/allocating such isochronous buffers/pagetables etc., to make the DMA layer still do the cache maintenance/non-cacheable remaps despite dev_is_dma_coherent() being true.

Thanks,
Robin.