Hi Sui,The preblem is that device driver can have various demand.
Le mercredi 07 juin 2023 à 22:38 +0800, Sui Jingfeng a écrit :
Hi, welcome to discussion.No problem.
I have limited skills in manipulating English.
It may not express what I'm really means in the short time.
Part of word in the sentence may not as accurate as your.
Well, please don't misunderstand, I'm not doing the rude to you.
I will explain it with more details.If you add the "dma-coherent" property in a device node in DT, you
See below:
On 2023/6/7 20:09, Paul Cercueil wrote:
Hi Sui,This is a *strategy*, not a *mechanism*.
Le mercredi 07 juin 2023 à 18:30 +0800, Sui Jingfeng a écrit :
Hi,Well, I don't really know how ACPI handles it - but it should just
On 2023/6/7 17:36, Paul Cercueil wrote:
Hi Sui,But this approach can only be applied for the device driver with
Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit :
The single map_noncoherent member of structDid you try to simply set the "dma-coherent" property to the
drm_gem_dma_object
may
not
sufficient for describing the backing memory of the GEM
buffer
object.
Especially on dma-coherent systems, the backing memory is
both
cached
coherent for multi-core CPUs and dma-coherent for peripheral
device.
Say architectures like X86-64, LoongArch64, Loongson Mips64,
etc.
Whether a peripheral device is dma-coherent or not can be
implementation-dependent. The single map_noncoherent option
is
not
enough
to reflect real hardware anymore. For example, the Loongson
LS3A4000
CPU
and LS2K2000/LS2K1000 SoC, peripheral device of such hardware
platform
allways snoop CPU's cache. Doing the allocation with
dma_alloc_coherent
function is preferred. The return buffer is cached, it should
not
using
the default write-combine mapping. While with the current
implement,
there
no way to tell the drm core to reflect this.
This patch adds cached and coherent members to struct
drm_gem_dma_object.
which allow driver implements to inform the core. Introducing
new
mappings
while keeping the original default behavior unchanged.
device's
node?
DT
support.
X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically
do
not
have DT support.
They using ACPI to pass parameter from the firmware to Linux
kernel.
You approach will lost the effectiveness on such a case.
be a
matter of setting dev->dma_coherent. That's basically what the DT
code
does.
Some MIPS boards set it in their setup code for instance.
In this case, DT is just used to describing the hardware.
(It is actually a hardware feature describing language, the
granularity
is large)
It does not changing the state of the hardware.
It's your platform firmware or kernel setting up code who actually do
such a things.
It's just that it works on *one* platform, it does not guarantee it
will
works on others.
effectively specify that the device is DMA-coherent; so you describe
the hardware, which is what DT is for, and you are not changing the
state of the hardware.
Note that some MIPS platforms (arch/mips/alchemy/common/setup.c)
default to DMA-coherent mapping; I believe you could do something
similar with your Loongson LS3A4000 CPU and LS2K2000/LS2K1000 SoC.
While my patch is trying to create a *mechanism* which could probablyI believe the DMA flushes the write buffer? I don't actually know the
works on all platform.
It is based the patch you have already commuted.
Thanks for your excellent contribution.
It is dma-coherent ? How does it achieve it?It is dma-coherent on Ingenic SoCs.From what I understand if you add that property then LinuxPlease do not mitigate the problems with confusing method.
will
use DMA
coherent memory even though you use dma_alloc_noncoherent() and
the
sync_single_for_cpu() / sync_single_for_device() are then NOPs.
This approach not only tend to generate confusion but also
implement-dependent
and arch-dependent. It's definitely problematic.
How does the dma_alloc_coherent/dma_alloc_noncoherent is a ARCH
specific
thing.
Dependent on how does the arch_dma_ops is implemented.
The definition of the coherent on different ARCH has different
meanings.
The definition of the wirte-combine on different ARCH has
different
meanings.
The wirte-combine(uncache acceleration) on mips is non dma-
coherent.
As far as I know, there is a write buffer within the mips cpu.
typically 64 byte, but it is not cache. It will gather the CPU write
access,
When a peripheral device do the DMA, how does you platform guarantee
the data in the CPU write buffer has been already arrived at (or
flushed
out to)
the system RAM?
Does the peripheral device snoop the CPU's write buffer,
or it need manually flush the write buffer with SYNC instruction?
details, it would be something to ask to Ingenic.
dma_alloc_noncoherent() gives you *cached* memory.My point is that the word *cached* reflect the nature,But on arm, It seem that wirte-combine is coherent. (guaranteedYes, I know that.
by
arch
implement).
I also heard using dma_alloc_coherent to allocation the buffer
for
the
non-coherent doesn't hurt, but the reverse is not true.
But please do not create confusion.
software composite is faster because better cacheusing rate and
cache is faster to read.
It is faster because it is cached, not because it is non-
coherent.
non-coherent is arch thing and/or driver-side thing,
it is a side effect of using the cached mapping.
It should left to driver to handle such a side effect. The deviceI understand. What I'm saying, is that you should be able to set
driver
know their device, so its the device driver's responsibility to
maintain
the coherency. On loongson platform, we don't need to call
drm_fb_dma_sync_non_coherent() function, Its already guaranteed
by
hardware.
dma_obj->map_noncoherent (which would arguably be better named
"map_cached",
dma-coherent or dma-noncoherent is secondary.
We are all on the way to pursue the performance.
In the end, it is the cache give us the speed.
Why not we credit the cache hardware inside of the CPU?
Therefore, if you want *cached* memory, you should set
gem->map_noncoherent.
I understand your confusion; it would be easier to understand if this
function was called dma_alloc_cached().
Then, if the memory is actually DMA-coherent for the device (dev-
dma_coherent == true), the drm_fb_dma_sync_non_coherent() function isa no-op.
But in both cases (DMA-coherent device, non DMA-coherent device), if
you want cached buffers, you should call dma_alloc_noncoherent().
Yes; and now I think that this was a bad idea (for the reasons Maximebut that's a different problem). Then the GEM code wouldYou already handle the side effect of such things, See below:
end up calling dma_alloc_noncoherent(), which will give you
*cached*
memory. Then as long as dev->dma_coherent = true,
drm_fb_dma_sync_non_coherent() should be a NOP - so you wouldn't
pointlessly sync/invalidate the caches.
And I disagree with you, the driver shouldn't handle such things.
```
if (ingenic_drm_map_noncoherent(ipu->master))
drm_fb_dma_sync_non_coherent(ipu->drm, oldstate, newstate);
```
By the way, Ingenic is the only driver in the drivers/gpu/drm/ that
handle such things
so far.
listed in his email).
Ideally, the driver should just call a function "dma_alloc_buffer",TheBut the fact is that, It is drm/ingenic tell the drm core, some SoC
fact that it is better to use cached memory or uncached with write-
combine really is platform-specific and not something that the
driver
should be aware of.
is
prefer cached,
but unable to enforce the coherent. So that it need flush the cache
manually.
What do you meant by saying that the driver should not be aware of ?
which would return cached memory when it makes sense, otherwise a
uncached buffer with the write-combine attribute.
Then the arch code (or DT) can decide what's the best setting, and not
the driver.
In the meantime, you should use gem->dma_noncoherent like the ingenic-
drm driver does - until somebody (probably me) refactor things.
Cheers,
-Paul
Cheers,
-Paul
Cheers,
-Paul
Signed-off-by: Sui Jingfeng <suijingfeng@xxxxxxxxxxx>
---
drivers/gpu/drm/drm_fb_dma_helper.c | 11 +++++------
drivers/gpu/drm/drm_fbdev_dma.c | 2 +-
drivers/gpu/drm/drm_gem_dma_helper.c | 20
++++++++++++++++----
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 5 ++++-
drivers/gpu/drm/rcar-du/Kconfig | 2 --
drivers/gpu/drm/rcar-du/rcar_du_kms.c | 4 +++-
include/drm/drm_gem_dma_helper.h | 7 +++++--
7 files changed, 34 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/drm_fb_dma_helper.c
b/drivers/gpu/drm/drm_fb_dma_helper.c
index 3b535ad1b07c..93ff05041192 100644
--- a/drivers/gpu/drm/drm_fb_dma_helper.c
+++ b/drivers/gpu/drm/drm_fb_dma_helper.c
@@ -106,16 +106,15 @@ dma_addr_t
drm_fb_dma_get_gem_addr(struct
drm_framebuffer *fb,
EXPORT_SYMBOL_GPL(drm_fb_dma_get_gem_addr);
/**
- * drm_fb_dma_sync_non_coherent - Sync GEM object to non-
coherent
backing
- * memory
+ * drm_fb_dma_sync_non_coherent - Sync GEM object to cached
backing
memory
* @drm: DRM device
* @old_state: Old plane state
* @state: New plane state
*
* This function can be used by drivers that use damage
clips
and
have
- * DMA GEM objects backed by non-coherent memory. Calling
this
function
- * in a plane's .atomic_update ensures that all the data in
the
backing
- * memory have been written to RAM.
+ * DMA GEM objects backed by cached memory. Calling this
function in
a
+ * plane's .atomic_update ensures that all the data in the
backing
memory
+ * have been written to RAM.
*/
void drm_fb_dma_sync_non_coherent(struct drm_device *drm,
struct drm_plane_state
*old_state,
@@ -131,7 +130,7 @@ void drm_fb_dma_sync_non_coherent(struct
drm_device *drm,
for (i = 0; i < finfo->num_planes; i++) {
dma_obj = drm_fb_dma_get_gem_obj(state->fb,
i);
- if (!dma_obj->map_noncoherent)
+ if (dma_obj->cached && dma_obj->coherent)
continue;
daddr = drm_fb_dma_get_gem_addr(state->fb,
state, i);
diff --git a/drivers/gpu/drm/drm_fbdev_dma.c
b/drivers/gpu/drm/drm_fbdev_dma.c
index d86773fa8ab0..49fe9b284cc8 100644
--- a/drivers/gpu/drm/drm_fbdev_dma.c
+++ b/drivers/gpu/drm/drm_fbdev_dma.c
@@ -131,7 +131,7 @@ static int
drm_fbdev_dma_helper_fb_probe(struct
drm_fb_helper *fb_helper,
/* screen */
info->flags |= FBINFO_VIRTFB; /* system memory */
- if (dma_obj->map_noncoherent)
+ if (dma_obj->cached)
info->flags |= FBINFO_READS_FAST; /* signal
caching
*/
info->screen_size = sizes->surface_height * fb-
pitches[0];info->screen_buffer = map.vaddr;
diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c
b/drivers/gpu/drm/drm_gem_dma_helper.c
index 870b90b78bc4..dec1d512bdf1 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -93,7 +93,11 @@ __drm_gem_dma_create(struct drm_device
*drm,
size_t size, bool private)
drm_gem_private_object_init(drm, gem_obj,
size);
/* Always use writecombine for dma-buf
mappings
*/
- dma_obj->map_noncoherent = false;
+ /* FIXME: This is not always true, on some
dma
coherent system,
+ * cached mappings should be preferred over
writecombine
+ */
+ dma_obj->cached = false;
+ dma_obj->coherent = false;
} else {
ret = drm_gem_object_init(drm, gem_obj,
size);
}
@@ -143,7 +147,11 @@ struct drm_gem_dma_object
*drm_gem_dma_create(struct drm_device *drm,
if (IS_ERR(dma_obj))
return dma_obj;
- if (dma_obj->map_noncoherent) {
+ if (dma_obj->cached && dma_obj->coherent) {
+ dma_obj->vaddr = dma_alloc_coherent(drm->dev,
size,
+ &dma_obj-
dma_addr,+
GFP_KERNEL |
__GFP_NOWARN);
+ } else if (dma_obj->cached && !dma_obj->coherent) {
dma_obj->vaddr = dma_alloc_noncoherent(drm-
dev,size,
&dma_obj-dma_addr,DMA_TO_DEVICE,
@@ -153,6 +161,7 @@ struct drm_gem_dma_object
*drm_gem_dma_create(struct drm_device *drm,
&dma_obj-
dma_addr,GFP_KERNEL |
__GFP_NOWARN);
}
+
if (!dma_obj->vaddr) {
drm_dbg(drm, "failed to allocate buffer
with
size
%zu\n",
size);
@@ -233,7 +242,10 @@ void drm_gem_dma_free(struct
drm_gem_dma_object
*dma_obj)
dma_buf_vunmap_unlocked(gem_obj-
import_attach->dmabuf, &map);drm_prime_gem_destroy(gem_obj, dma_obj-
sgt);} else if (dma_obj->vaddr) {
- if (dma_obj->map_noncoherent)
+ if (dma_obj->cached && dma_obj->coherent)
+ dma_free_coherent(gem_obj->dev->dev,
dma_obj-
base.size,+ dma_obj->vaddr,
dma_obj-
dma_addr);+ else if (dma_obj->cached && !dma_obj-
coherent)dma_free_noncoherent(gem_obj->dev-
dev,dma_obj->base.size,
dma_obj-
vaddr,dma_obj-
dma_addr,DMA_TO_DEVICE);
@@ -532,7 +544,7 @@ int drm_gem_dma_mmap(struct
drm_gem_dma_object
*dma_obj, struct vm_area_struct *
vma->vm_pgoff -= drm_vma_node_start(&obj-
vma_node);vm_flags_mod(vma, VM_DONTEXPAND, VM_PFNMAP);
- if (dma_obj->map_noncoherent) {
+ if (dma_obj->cached) {
vma->vm_page_prot = vm_get_page_prot(vma-
vm_flags);ret = dma_mmap_pages(dma_obj->base.dev-
dev,diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 5ec75e9ba499..a3df2f99a757 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -919,7 +919,10 @@ ingenic_drm_gem_create_object(struct
drm_device
*drm, size_t size)
if (!obj)
return ERR_PTR(-ENOMEM);
- obj->map_noncoherent = priv->soc_info-map_noncoherent;+ if (priv->soc_info->map_noncoherent) {
+ obj->cached = true;
+ obj->coherent = false;
+ }
return &obj->base;
}
diff --git a/drivers/gpu/drm/rcar-du/Kconfig
b/drivers/gpu/drm/rcar-
du/Kconfig
index 53c356aed5d5..dddc70c08bdc 100644
--- a/drivers/gpu/drm/rcar-du/Kconfig
+++ b/drivers/gpu/drm/rcar-du/Kconfig
@@ -2,8 +2,6 @@
config DRM_RCAR_DU
tristate "DRM Support for R-Car Display Unit"
depends on DRM && OF
- depends on ARM || ARM64
- depends on ARCH_RENESAS || COMPILE_TEST
select DRM_KMS_HELPER
select DRM_GEM_DMA_HELPER
select VIDEOMODE_HELPERS
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
index adfb36b0e815..1142d51473e6 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
@@ -386,7 +386,9 @@ struct drm_gem_object
*rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
gem_obj->funcs = &rcar_du_gem_funcs;
drm_gem_private_object_init(dev, gem_obj, attach-dmabuf-- dma_obj->map_noncoherent = false;
size);
+
+ dma_obj->cached = false;
+ dma_obj->coherent = false;
ret = drm_gem_create_mmap_offset(gem_obj);
if (ret) {
diff --git a/include/drm/drm_gem_dma_helper.h
b/include/drm/drm_gem_dma_helper.h
index 8a043235dad8..585ce3d4d1eb 100644
--- a/include/drm/drm_gem_dma_helper.h
+++ b/include/drm/drm_gem_dma_helper.h
@@ -16,7 +16,9 @@ struct drm_mode_create_dumb;
* more than one entry but they are guaranteed to
have
contiguous
* DMA addresses.
* @vaddr: kernel virtual address of the backing memory
- * @map_noncoherent: if true, the GEM object is backed by
non-
coherent memory
+ * @cached: if true, the GEM object is backed by cached
memory
+ * @coherent: This option only meaningful when a GEM object
is
cached.
+ * If true, Sync the GEM object for DMA access is
not
required.
*/
struct drm_gem_dma_object {
struct drm_gem_object base;
@@ -26,7 +28,8 @@ struct drm_gem_dma_object {
/* For objects with DMA memory allocated by GEM DMA
*/
void *vaddr;
- bool map_noncoherent;
+ bool cached;
+ bool coherent;
};
#define to_drm_gem_dma_obj(gem_obj) \