Am Donnerstag, dem 23.06.2022 um 14:52 +0200 schrieb Christian König:
Am 23.06.22 um 14:14 schrieb Lucas Stach:I fail to construct a case where you want the Vulkan/GL "no domain
Am Donnerstag, dem 23.06.2022 um 13:54 +0200 schrieb Christian König:Yeah, and exactly that's what I meant with "DMA-buf is not the framework
Am 23.06.22 um 13:29 schrieb Lucas Stach:Yes, trying to hook up a peripheral that assumes cache snooping in some
[SNIP]
I mean I even had somebody from ARM which told me that this is not going
to work with our GPUs on a specific SoC. That there are ARM internal use
cases which just seem to work because all the devices are non-coherent
is completely new to me.
design details to a non coherent SoC may end up exploding in various
ways. On the other hand you can work around most of those assumptions
by marking the memory as uncached to the CPU, which may tank
performance, but will work from a correctness PoV.
for this".
See we do support using uncached/not snooped memory in DMA-buf, but only
for the exporter side.
For example the AMD and Intel GPUs have a per buffer flag for this.
The importer on the other hand needs to be able to handle whatever the
exporter provides.
transition" coherent semantic without the allocator knowing about this.
If you need this and the system is non-snooping, surely the allocator
will choose uncached memory.
I agree that you absolutely need to fail the usage when someone imports
a CPU cached buffer and then tries to use it as GL coherent on a non-
snooping system. That simply will not work.
And here is where our line of thought diverges: the DMA API allowsSounds like I'm not making clear what I want to say here: For the[SNIP]That only turns up the lines in DMA_BUF_IOCTL_SYNC doc, which are
Search for "cache coherency management". It's quite a while ago, but IYou can of course use DMA-buf in an incoherent environment, but then youCan you point me to that part of the documentation? A quick grep for
can't expect that this works all the time.
This is documented behavior and so far we have bluntly rejected any of
the complains that it doesn't work on most ARM SoCs and I don't really
see a way to do this differently.
"coherent" didn't immediately turn something up within the DMA-buf
dirs.
do remember helping to review that stuff.
saying the exact opposite of the DMA-buf is always coherent.
exporter using cache coherent memory is optional, for the importer it isn't.
For the exporter it is perfectly valid to use kmalloc, get_free_page
etc... on his buffers as long as it uses the DMA API to give the
importer access to it.
snooping and non-snooping devices to work together just fine, as it has
explicit domain transitions, which are no-ops if both devices are
snooping, but will do the necessary cache maintenance when one of them
is non-snooping but the memory is CPU cached.
I don't see why DMA-buf should be any different here. Yes, you can not
support the "no domain transition" sharing when the memory is CPU
cached and one of the devices in non-snooping, but you can support 99%
of real use-cases like the non-snooped scanout or the UVC video import.
The importer on the other hand needs to be able to deal with that. WhenWhy? The importer maps the dma-buf via dma_buf_map_attachment, which in
this is not the case then the importer somehow needs to work around that.
most cases triggers a map via the DMA API on the exporter side. This
map via the DMA API will already do the right thing in terms of cache
management, it's just that we explicitly disable it via
DMA_ATTR_SKIP_CPU_SYNC in DRM because we know that the mapping will be
cached, which violates the DMA API explicit domain transition anyway.
Either by flushing the CPU caches or by rejecting using the importedIt's not just display drivers, video codec accelerators and most GPUs
buffer for this specific use case (like AMD and Intel drivers should be
doing).
If the Intel or ARM display drivers need non-cached memory and don't
reject buffer where they don't know this then that's certainly a bug in
those drivers.
in this space are also non-snooping. In the ARM SoC world everyone just
assumes you are non-snooping, which is why things work for most cases
and only a handful like the UVC video import is broken.
Otherwise we would need to change all DMA-buf exporters to use a specialI would really like to know what issues popped up there. Moving the
function for allocation non-coherent memory and that is certainly not
going to fly.
I also don't see why you think that both world views are so totallyWell the key point is we intentionally rejected that design previously
different. We could just require explicit domain transitions for non-
snoop access, which would probably solve your scanout issue and would
not be a problem for most ARM systems, where we could no-op this if the
buffer is already in uncached memory and at the same time keep the "x86
assumes cached + snooped access by default" semantics.
because it created all kind of trouble as well.
dma-buf attachment to work more like a buffer used with the DMA API
seems like a good thing to me.
For this limited use case of doing a domain transition right beforeThe only case I see that we still couldn't support with a change in
scanout it might make sense, but that's just one use case.
that direction is the GL coherent access to a imported buffer that has
been allocated from CPU cached memory on a system with non-snooping
agents. Which to me sounds like a pretty niche use-case, but I would be
happy to be proven wrong.
Regards,
Lucas