Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

From: Sui Jingfeng
Date: Thu Jun 22 2023 - 15:26:30 EST

Next message: Zhangjin Wu: "Re: [PATCH v4 10/10] selftests/nolibc: add mmap and munmap test cases"
Previous message: Björn Roy Baron via B4 Relay: "[PATCH] rust: alloc: Add realloc and alloc_zeroed to the GlobalAlloc impl"
In reply to: Lucas Stach: "Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device"
Next in thread: Robin Murphy: "Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi

On 2023/6/21 23:58, Lucas Stach wrote:

Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:

Hi,

On 2023/6/21 18:00, Lucas Stach wrote:

dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
etnaviv_op_to_dma_dir(op));
etnaviv_obj->last_cpu_prep_op = op;
@@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
{
struct drm_device *dev = obj->dev;
struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+ struct etnaviv_drm_private *priv = dev->dev_private;
- if (etnaviv_obj->flags & ETNA_BO_CACHED) {
+ if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
/* fini without a prep is almost certainly a userspace error */
WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
index 3524b5811682..754126992264 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
@@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
struct dma_buf_attachment *attach, struct sg_table *sgt)
{
+ struct etnaviv_drm_private *priv = dev->dev_private;
struct etnaviv_gem_object *etnaviv_obj;
size_t size = PAGE_ALIGN(attach->dmabuf->size);
+ u32 cache_flags = ETNA_BO_WC;
int ret, npages;
- ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
+ if (priv->dma_coherent)
+ cache_flags = ETNA_BO_CACHED;
+

Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
from WC to CACHED as necessary by adding something like this:

I understand you are a profession person in vivante GPU driver domain.

I respect you reviews and instruction.

But, I'm really reluctant to agree with this, is there any space to
negotiate?

/*
* Upgrade WC to CACHED when the device is hardware coherent and the
* platform doesn't allow mixing cached and writecombined mappings to
* the same memory area.
*/
if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;

This is policy, not a mechanism.

Using what cache property is a user-space program's choice.

While you are override the WC with CACHED mapping. This is not correct
in the concept!

Please explain why you think that this isn't correct.

This is NOT always correct because:

when etnaviv_gem_prime_import_sg_table() is called,

drm/etnaviv is importing buffer from other drivers (for example drm/loongson or drm/ingenic).

drm/etnaviv is the importer, and drm/loongson or drm/ingenic is the exporter.

While drm/etnaviv using the WC mapping by default, which may cause *cache aliasing*,

Because the imported buffer originally belong to the KMS driver side.

The KMS driver may choose cached mapping by default.

For drm/ingenic(jz4770), the BO will be cached, but their hardware can't guarantee coherency.

they have to flush the cache manually when do the frame buffer update(damage update or dirty update).

Because cached mapping is more fast than the WC mapping.

For drm/loongson, shared buffer have to pinned into GTT,

By default, we using the cached mapping for the buffers in the system RAM;

But we are lucky, our the hardware bless us,

the hardware maintain the cache coherency for us.

While drm/etnaviv choose the WC mapping for the imported buffer blindly,

when doing the import,

*drm/etnaviv know nothing about the original buffer's caching property.*

This is the problem.

Currently, I guess drm/etnaviv only works for drm/imx,

because drm/imx choose WC mapping as the cache property by default for the dumb buffer.

The cache property mapping match, so it works.

But their no communications between the KMS driver and RO driver.

I think, drm/etanviv will also wrong on the ARM64 platforms where cache coherency is maintained by the hardware.

If the KMS driver using cached mapping, while drm/etnaviv using WC mapping,

There still will have problems.

I will go to find the hardware, and do the testing for you.

If using WC
mappings cause a potential loss of coherency on your platform, then we
can not allow the userspace driver to use WC mappings.

I have explained with my broken English,

The point is that loss of coherency may because KMS-RO framework have defect,

maybe software side bug, please don't make the conclusion too fast.

Xorg DDX driver (EXA base DDX for an example) fallback to the CPU do the software rendering.

It is even faster than the hardware accelerated implement,

especially in the case where the CPU is fast and the GPU is weak.

LS7A1000(GC1000) + LS3A5000@2.5Ghz X4 core is such a case.

For Discrete on-board VRAM we will using the WC mapping, it will not be shared with other driver,

The CPU use it themself, this is definitionally OK.

In practice, it works like a charm.

For buffers in the system RAM and GTT,

We will using cached mapping for faster CPU software rendering.

hard-coded by the driver.

We are benefited so much from the cached mapping.

While when using write-combine caching mapping for a buffer on LoongArch and Loongson Mips CPU,

It is OK for CPU write.

But when you want read from such buffer by the CPU, it is just boil down to uncached.

For our platform, uncached read serve as a kind of SYNC.

It will flush the data in the CPU's write buffer.

This cause performance drop.

But we still can use the write-combine caching mapping in the circumstances where do not need the CPU read.

So allow write-combine is providing one more choice to the user-space,

It is just that use it at your own risk.

As I would like to keep the option of WC mappings,

Yeah, I know what's reason.

Because on the platform you love (imx6q, imx8q, for an example),

The hardware don't maintain the cache coherency.

So if fallback to software rendering, you will need to flush the cache frequency.

the performance will drop very much, right?

--
Jingfeng

Next message: Zhangjin Wu: "Re: [PATCH v4 10/10] selftests/nolibc: add mmap and munmap test cases"
Previous message: Björn Roy Baron via B4 Relay: "[PATCH] rust: alloc: Add realloc and alloc_zeroed to the GlobalAlloc impl"
In reply to: Lucas Stach: "Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device"
Next in thread: Robin Murphy: "Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]