Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc

From: Sui Jingfeng
Date: Mon Oct 09 2023 - 23:03:09 EST



On 2023/10/10 08:15, WANG Xuerui wrote:

On 10/9/23 22:32, Sui Jingfeng wrote:
Hi,


On 2023/10/9 12:28, Icenowy Zheng wrote:
Currently the code disables WUC only disables it for ioremap_wc(), which
is only used when mapping writecombine pages like ioremap() (mapped to
the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a
crafted pgprot with pgprot_writecombine() function, which isn't
corrently disabled now.

Disable WUC for pgprot_writecombine() (fallback to SUC) too.

This improves AMDGPU driver stability on Loongson 3A5000 machines.

Signed-off-by: Icenowy Zheng <uwu@xxxxxxxxxx>
---
Changes since v1:
- Removed _WC macros
- Mention ioremap_wc in commit message

  arch/loongarch/include/asm/io.h           |  5 ++---
  arch/loongarch/include/asm/pgtable-bits.h |  4 +++-
  arch/loongarch/kernel/setup.c             | 10 +++++-----
  3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
index 0dcb36b32cb25..290aad87a8847 100644
--- a/arch/loongarch/include/asm/io.h
+++ b/arch/loongarch/include/asm/io.h
@@ -52,10 +52,9 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
   * @offset:    bus address of the memory
   * @size:      size of the resource to map
   */
-extern pgprot_t pgprot_wc;
-
  #define ioremap_wc(offset, size)    \
-    ioremap_prot((offset), (size), pgprot_val(pgprot_wc))
+    ioremap_prot((offset), (size), pgprot_val( \
+        wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC))
    #define ioremap_cache(offset, size)    \
      ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h
index 35348d4c4209a..a3d827701736d 100644
--- a/arch/loongarch/include/asm/pgtable-bits.h
+++ b/arch/loongarch/include/asm/pgtable-bits.h
@@ -92,6 +92,8 @@
    #ifndef __ASSEMBLY__
  +extern bool wc_enabled;
+
  #define _PAGE_IOREMAP        pgprot_val(PAGE_KERNEL_SUC)
    #define pgprot_noncached pgprot_noncached
@@ -111,7 +113,7 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
  {
      unsigned long prot = pgprot_val(_prot);
  -    prot = (prot & ~_CACHE_MASK) | _CACHE_WUC;
+    prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC : _CACHE_SUC);
        return __pgprot(prot);
  }
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 7783f0a3d742c..465c1dbb6f4b4 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -161,19 +161,19 @@ static void __init smbios_parse(void)
  }
    #ifdef CONFIG_ARCH_WRITECOMBINE
-pgprot_t pgprot_wc = PAGE_KERNEL_WUC;
+bool wc_enabled = true;
  #else
-pgprot_t pgprot_wc = PAGE_KERNEL_SUC;
+bool wc_enabled;
  #endif
  -EXPORT_SYMBOL(pgprot_wc);
+EXPORT_SYMBOL(wc_enabled);
    static int __init setup_writecombine(char *p)
  {
      if (!strcmp(p, "on"))
-        pgprot_wc = PAGE_KERNEL_WUC;
+        wc_enabled = true;
      else if (!strcmp(p, "off"))
-        pgprot_wc = PAGE_KERNEL_SUC;
+        wc_enabled = false;
      else
          pr_warn("Unknown writecombine setting \"%s\".\n", p);


Good catch!

But this will make the write combine(WC) mappings completely unusable on LoongArch.
This is nearly equivalent to say that LoongArch don't support write combine at all.
But the write combine(WC) mappings works fine for software based drm drivers,
such as drm/loongson and drm/ast etc. Even include drm/radeon and drm/amdgpu with
pure software rendering setup (by putting Option "Accel" "off" into 10-amdgpu.conf
or 10-radeon.conf) After merge this patch, the performance drop dramatically for
2D software rendering based display controller drivers.

Well, this patch itself is a good catch, as it is a fix for the commit <16c52e503043>
("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm afraid that
both of this commit and the <16c52e503043> commit are not a *real* fix write combine
related issue on LoongArch. It just negative sidestep of the real problem.
Sure, but given the public information I have access to, I don't think it's possible to "really" fix the root cause with software only. Do you have any insight on this, given from your perspective and language, such a solution seems to exist?

On LoongArch, cached mapping and uncached mappings are DMA-coherent and guaranteed by
the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU is involved. Therefore,
On downstream kernel, We disable write combine(WC) mappings at the drm drivers side.

- For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings.
- For buffers reside in RAM, we replace the WC mappings with cached mappings.

By this way, we were able to minimum the side effects, and meet the usable requirements
for all of the GPU drivers.

For DMA non-coherent buffers, we should try to implement arch-specific dma_map_ops,
invalidate the CPU cache and flush the CPU write buffer before the device do DMA. Instead
of pretend to be DMA coherent for all buffers, a kernel cmdline is not a system level
solution for all of GPU drivers and OS release.