Am Donnerstag, dem 23.06.2022 um 13:54 +0200 schrieb Christian König:
Am 23.06.22 um 13:29 schrieb Lucas Stach:Yes, trying to hook up a peripheral that assumes cache snooping in some
[SNIP]
I mean I even had somebody from ARM which told me that this is not going
to work with our GPUs on a specific SoC. That there are ARM internal use
cases which just seem to work because all the devices are non-coherent
is completely new to me.
design details to a non coherent SoC may end up exploding in various
ways. On the other hand you can work around most of those assumptions
by marking the memory as uncached to the CPU, which may tank
performance, but will work from a correctness PoV.
[SNIP]Coherent can mean 2 different things:
Non coherent access, including your non-snoop scanout, and no domainYeah, that's the stuff I totally agree on.
transition signal just doesn't go together when you want to solve
things in a generic way.
See we absolutely do have the requirement of implementing coherent
access without domain transitions for Vulkan and OpenGL+extensions.
1. CPU cached with snooping from the IO device
2. CPU uncached
The Vulkan and GL "coherent" uses are really coherent without explicit
domain transitions, so on non coherent arches that require the
transitions the only way to implement this is by making the memory CPU
uncached. Which from a performance PoV will probably not be what app
developers expect, but will still expose the correct behavior.
The DMA_BUF_IOCTL_SYNC is available in upstream, with the explicitRemember that in a fully (not only IO) coherent system the CPU isn'tIIRC we do already have/had a SYNC_IOCTL for cases like this, but (I
the only agent that may cache the content you are trying to access
here. The dirty cacheline could reasonably still be sitting in a GPU or
VPU cache, so you need some way to clean those cachelines, which isn't
a magic "importer knows how to call CPU cache clean instructions".
need to double check as well, that's way to long ago) this was kicked
out because of the requirements above.
documentation that "userspace can not rely on coherent access".
That only turns up the lines in DMA_BUF_IOCTL_SYNC doc, which areSearch for "cache coherency management". It's quite a while ago, but IYou can of course use DMA-buf in an incoherent environment, but then youCan you point me to that part of the documentation? A quick grep for
can't expect that this works all the time.
This is documented behavior and so far we have bluntly rejected any of
the complains that it doesn't work on most ARM SoCs and I don't really
see a way to do this differently.
"coherent" didn't immediately turn something up within the DMA-buf
dirs.
do remember helping to review that stuff.
saying the exact opposite of the DMA-buf is always coherent.
I also don't see why you think that both world views are so totally
different. We could just require explicit domain transitions for non-
snoop access, which would probably solve your scanout issue and would
not be a problem for most ARM systems, where we could no-op this if the
buffer is already in uncached memory and at the same time keep the "x86
assumes cached + snooped access by default" semantics.
Regards,
Lucas