Re: [PATCH] iommu/arm-smmu-v3: Maintain valid access attributes for non-coherent SMMU
From: Robin Murphy
Date: Mon Jan 05 2026 - 11:18:32 EST
On 2026-01-05 2:53 pm, Jason Gunthorpe wrote:
On Mon, Jan 05, 2026 at 01:33:34PM +0000, Robin Murphy wrote:
The assumption is that if the SMMU is not I/O-coherent, then the Normal
Cacheable attribute will inherently degrade to a non-snooping (and thus
effectively Normal Non-Cacheable) one, as that's essentially what AXI will
do in practice, and thus the attribute doesn't actually matter all that much
in terms of functional correctness. If the SMMU _is_ capable of snooping but
is not described as such then frankly firmware is wrong.
Sadly I am aware of people doing this.. Either a HW bug or some other
weird issue forces the FW to set a non-coherent FW attribute even
though the HW is partially or fully able to process cachable AXI
attributes.
What I've seen more often is that firmware authors ignore (or don't even
realise) I/O-coherency, then someone ends up trying to "fix" Linux with
wacky DMA API patches when things start going wrong on mis-described
hardware... Yes, they might happen to get away with it with stuff like
SMMUs or Mali GPUs where a "well-behaved" driver might have control of
the source attributes, but where AxCACHE/AxDOMAIN are hard-wired there's
really no option other than to describe the hardware correctly, hence
that should always be the primary consideration.
And as I say, if there *is* some reason to specifically avoid using
certain attributes, then deliberately describing the hardware
incorrectly, in the hope that OS drivers might be "well-behaved" in a
way that just happens to lead to them not using those attributes, is
really not a robust workaround anyway.
It is reasonable that Linux will set the attributes properly based on
what it is doing. Setting the wrong attributes and expecting the HW to
ignore them seems like a hacky direction.
Oh, I'm not saying that we *shouldn't* set our attributes more exactly -
this would still be needed for doing things the "right" way too - I just
want to be very clear on the reasons why. The current assumption is not
a bug per se, and although it's indeed not 100% robust, the cases where
it doesn't hold are more often than not for the wrong reason. Therefore
I would say doing this purely for the sake of working around bad
firmware - and especially errata - is just as hacky if not more so.
I didn't see anything in the spec that says COHACC means the memory
attributes are ignored and forced to non-coherent, even though that is
the current assumption of the driver.
It kinda works the other way round: COHACC==1 says that the Cacheability
and Shareability attributes *can* be configured to snoop CPU caches
(although do not have to be); therefore the "If a system does not
support IO-coherent access from the SMMU, SMMU_IDR0.COHACC must be 0"
case implies that the SMMU is incapable of snooping regardless of how
those attributes are set, thus must at least be in a different Inner
Shareability domain from the CPUs, at which point the Cacheability
domains probably aren't too meaningful either. I would consider it would
be unexpectedly incorrect behaviour for an SMMU reporting COHACC==0 to
actually be capable of snooping.
If prople have a good reason for wanting to use a coherent SMMU
non-coherently (and/or control of allocation hints), then that should really
be some kind of driver-level option - it would need a bit of additional DMA
API work (which has been low down my to-do list for a few years now...), but
it's perfectly achievable, and I think it's still preferable to abusing the
COHACC override in firmware.
IMHO, this is a different topic, and something that will probably
become interesting this year. I'm aware of some HW/drivers that needs
optional non-coherent mappings for isochronous flows - but it is not
the DMA API that is the main issue but the page table walks :\
Hmm, yeah, potentially configuring PTW attributes for a DMA domain is
something I hadn't even thought about - the DMA API aspect I mean is
that in general we need some sort of DMA_ATTR_NO_SNOOP when
mapping/allocating such isochronous buffers/pagetables etc., to make the
DMA layer still do the cache maintenance/non-cacheable remaps despite
dev_is_dma_coherent() being true.
Thanks,
Robin.