On Wed, Jun 07, 2023 at 06:30:01PM +0800, Sui Jingfeng wrote:
On 2023/6/7 17:36, Paul Cercueil wrote:Not really, no. All DT support is doing is setting some generic device
Hi Sui,But this approach can only be applied for the device driver with DT support.
Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit :
The single map_noncoherent member of struct drm_gem_dma_object mayDid you try to simply set the "dma-coherent" property to the device's
not
sufficient for describing the backing memory of the GEM buffer
object.
Especially on dma-coherent systems, the backing memory is both cached
coherent for multi-core CPUs and dma-coherent for peripheral device.
Say architectures like X86-64, LoongArch64, Loongson Mips64, etc.
Whether a peripheral device is dma-coherent or not can be
implementation-dependent. The single map_noncoherent option is not
enough
to reflect real hardware anymore. For example, the Loongson LS3A4000
CPU
and LS2K2000/LS2K1000 SoC, peripheral device of such hardware
platform
allways snoop CPU's cache. Doing the allocation with
dma_alloc_coherent
function is preferred. The return buffer is cached, it should not
using
the default write-combine mapping. While with the current implement,
there
no way to tell the drm core to reflect this.
This patch adds cached and coherent members to struct
drm_gem_dma_object.
which allow driver implements to inform the core. Introducing new
mappings
while keeping the original default behavior unchanged.
node?
X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically do not
have DT support.
They using ACPI to pass parameter from the firmware to Linux kernel.
You approach will lost the effectiveness on such a case.
parameter based on that property, but the infrastructure is very much
generic and can be used on systems without a DT.
>From what I understand if you add that property then Linux will use DMA
coherent memory even though you use dma_alloc_noncoherent() and the
sync_single_for_cpu() / sync_single_for_device() are then NOPs.
Please do not mitigate the problems with confusing method.It's not a confusing method, it's one of the two main API to deal with
DMA buffers. And you might disagree with Paul but there's no need to be
rude about it.
This approach not only tend to generate confusion but alsoThen MIPS breaks the DMA allocation semantics. A buffer allocated with
implement-dependent and arch-dependent. It's definitely problematic.
How does the dma_alloc_coherent/dma_alloc_noncoherent is a ARCH specific
thing.
Dependent on how does the arch_dma_ops is implemented.
The definition of the coherent on different ARCH has different meanings.
The definition of the wirte-combine on different ARCH has different
meanings.
The wirte-combine(uncache acceleration) on mips is non dma-coherent.
dma_alloc_wc is supposed to be coherent.
But on arm, It seem that wirte-combine is coherent. (guaranteed by archHonestly, it's not clear to me what your point or issue is.
implement).
I also heard using dma_alloc_coherent to allocation the buffer for the
non-coherent doesn't hurt, but the reverse is not true.
But please do not create confusion.
software composite is faster because better cacheusing rate and
cache is faster to read.
It is faster because it is cached, not because it is non-coherent.
non-coherent is arch thing and/or driver-side thing,
it is a side effect of using the cached mapping.
Going back to the description in your commit log, you mention that you
want to support multiple hardware that might or might not be DMA
coherent, and thus you want to allocate a buffer with different
attributes depending on that?
Like, you say that the LS3A4000 has a coherency unit and thus doing the
allocation with dma_alloc_coherent is preferred. Preferred to what? A WC
buffer? Why?
A WC buffer is a coherent buffer that is allowed to cache writes.
It doesn't have to, and worst case scenario you're inexactly the same
case than a dma_alloc_coherent buffer.
It should left to driver to handle such a side effect. The deviceNot really, no. Some driver are used across multiple SoCs and multiple
driver know their device, so its the device driver's responsibility to
maintain the coherency.
arch. It doesn't make any sense to encode this in the driver... which is
why it's in the DT in the first place, and abstracted away by the DMA
API. Like, do you really expect the amdgpu driver to know the DMA
attributes it needs to allocate a buffer from when running from a
RaspberryPi?
On loongson platform, we don't need to callAnd mostly guaranteed by dma_alloc_coherent. And if you wanted to call
drm_fb_dma_sync_non_coherent() function, Its already guaranteed by
hardware.
it anyway, it would be a nop if the device is declared as coherent
already.
I think you're thinking about this backward. A buffer has mapping
attributes, and a device has hardware properties.
The driver (ie, software) will allocate a buffer with some mapping
attributes, and will assume that they are met in the rest of its code.
How they are met is an implementation detail of the hardware, and for
all the driver cares, it doesn't have to match.
You can allocate a WC buffer to use on a non-coherent device and that's
fine. You can allocate a non-coherent buffer on a coherent device and
that's fine too. The DMA API will make everything work when it needs to,
and if the hardware already provides stronger guarantees, then it will
just skip whatever is redundant.
So you need to write your driver using buffer is the most convenient for
you, and it's really all that matters at the driver level. But for that
to work, you need to flag the coherence-ness of your devices properly,
like Paul suggested.
Maxime