Hi Christian,
Le mardi 23 janvier 2024 à 14:28 +0100, Christian König a écrit :
Am 23.01.24 um 14:02 schrieb Paul Cercueil:
[SNIP]Well I just double checked the source in 6.7.1 and I can't see
You actually do - at least with udmabuf, as in that caseNo, the problem is still that you would have to change allThat an exporter has to call extra functions to access hisThen what about we add these dma_buf_{begin,end}_access(), with
own
buffers
is a complete no-go for the design since this forces
exporters
into
doing extra steps for allowing importers to access their
data.
only
implementations for "dumb" exporters e.g. udmabuf or the dmabuf
heaps?
And only importers (who cache the mapping and actually care
about
non-
coherency) would have to call these.
importers
to
mandatory use dma_buf_begin/end.
But going a step back caching the mapping is irrelevant for
coherency.
Even if you don't cache the mapping you don't get coherency.
dma_buf_map_attachment() / dma_buf_unmap_attachment() will handle
cache
coherency when the SGs are mapped/unmapped.
udmabuf doing anything for cache coherency in map/unmap.
All it does is calling dma_map_sgtable() and dma_unmap_sgtable() to
create and destroy the SG table and those are not supposed to sync
anything to the CPU cache.
In other words drivers usually use DMA_ATTR_SKIP_CPU_SYNC here, it's
just that this is missing from udmabuf.
Ok.
The problem was then that dma_buf_unmap_attachment cannot be calledWell what sync are you talking about? CPU sync? In DMA-buf that is
before the dma_fence is signaled, and calling it after is already
too
late (because the fence would be signaled before the data is
sync'd).
handled differently.
For importers it's mandatory that they can be coherent with the
exporter. That usually means they can snoop the CPU cache if the
exporter can snoop the CPU cache.
I seem to have such a system where one device can snoop the CPU cache
and the other cannot. Therefore if I want to support it properly, I do
need cache flush/sync. I don't actually try to access the data using
the CPU (and when I do, I call the sync start/end ioctls).
For exporters you can implement the begin/end CPU access functions
which allows you to implement something even if your exporting device
can't snoop the CPU cache.
That only works if the importers call the begin_cpu_access() /
end_cpu_access(), which they don't.
Daniel / Sima suggested then that I cache the mapping and add newYeah, I've now catched up on the latest mail. Sorry I haven't seen
functions to ensure cache coherency, which is what these patches
are
about.
that before.
Yeah, that sounds strongly like one of the use cases we have
In other words exporters are not require to call sync_to_cpu orMy use-case is, I create DMABUFs with udmabuf, that I attach to
sync_to_device when you create a mapping.
What exactly is your use case here? And why does coherency
matters?
USB/functionfs with the interface introduced by this patchset. I
attach
them to IIO with a similar interface (being upstreamed in
parallel),
and transfer data from USB to IIO and vice-versa in a zero-copy
fashion.
This works perfectly fine as long as the USB and IIO hardware are
coherent between themselves, which is the case on most of our
boards.
However I do have a board (with a Xilinx Ultrascale SoC) where it
is
not the case, and cache flushes/sync are needed. So I was trying to
rework these new interfaces to work on that system too.
rejected so far.
If this really is a no-no, then I am fine with the assumption thatWhat you can do is that instead of using udmabuf or dma-heaps is
devices sharing a DMABUF must be coherent between themselves; but
that's something that should probably be enforced rather than
assumed.
(and I *think* there is a way to force coherency in the
Ultrascale's
interconnect - we're investigating it)
that the device which can't provide coherency act as exporters of the
buffers.
The exporter is allowed to call sync_for_cpu/sync_for_device on it's
own buffers and also gets begin/end CPU access notfications. So you
can then handle coherency between the exporter and the CPU.
But again that would only work if the importers would call
begin_cpu_access() / end_cpu_access(), which they don't, because they
don't actually access the data using the CPU.
Unless you mean that the exporter can call sync_for_cpu/sync_for_device
before/after every single DMA transfer so that the data appears
coherent to the importers, without them having to call
begin_cpu_access() / end_cpu_access().
In which case - this would still demultiply the complexity; my USB-
functionfs interface here (and IIO interface in the separate patchset)
are not device-specific, so I'd rather keep them importers.
If you really don't have coherency between devices then that would
be a really new use case and we would need much more agreement on how
to do this.
[snip]
Agreed. Desiging a good generic solution would be better.
With that said...
Let's keep it out of this USB-functionfs interface for now. The
interface does work perfectly fine on platforms that don't have
coherency problems. The coherency issue in itself really is a
tangential issue.
So I will send a v6 where I don't try to force the cache coherency -
and instead assume that the attached devices are coherent between
themselves.
But it would be even better to have a way to detect non-coherency and
return an error on attach.
Cheers,
-Paul