Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc

From: Icenowy Zheng
Date: Mon Oct 09 2023 - 20:51:30 EST


在 2023-10-09星期一的 22:32 +0800,Sui Jingfeng写道:
> Hi,
>
>
> On 2023/10/9 12:28, Icenowy Zheng wrote:
> > Currently the code disables WUC only disables it for ioremap_wc(),
> > which
> > is only used when mapping writecombine pages like ioremap() (mapped
> > to
> > the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a
> > crafted pgprot with pgprot_writecombine() function, which isn't
> > corrently disabled now.
> >
> > Disable WUC for pgprot_writecombine() (fallback to SUC) too.
> >
> > This improves AMDGPU driver stability on Loongson 3A5000 machines.
> >
> > Signed-off-by: Icenowy Zheng <uwu@xxxxxxxxxx>
> > ---
> > Changes since v1:
> > - Removed _WC macros
> > - Mention ioremap_wc in commit message
> >
> >   arch/loongarch/include/asm/io.h           |  5 ++---
> >   arch/loongarch/include/asm/pgtable-bits.h |  4 +++-
> >   arch/loongarch/kernel/setup.c             | 10 +++++-----
> >   3 files changed, 10 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/loongarch/include/asm/io.h
> > b/arch/loongarch/include/asm/io.h
> > index 0dcb36b32cb25..290aad87a8847 100644
> > --- a/arch/loongarch/include/asm/io.h
> > +++ b/arch/loongarch/include/asm/io.h
> > @@ -52,10 +52,9 @@ static inline void __iomem
> > *ioremap_prot(phys_addr_t offset, unsigned long size,
> >    * @offset:    bus address of the memory
> >    * @size:      size of the resource to map
> >    */
> > -extern pgprot_t pgprot_wc;
> > -
> >   #define ioremap_wc(offset, size)      \
> > -       ioremap_prot((offset), (size), pgprot_val(pgprot_wc))
> > +       ioremap_prot((offset), (size), pgprot_val( \
> > +               wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC))
> >  
> >   #define ioremap_cache(offset, size)   \
> >         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
> > diff --git a/arch/loongarch/include/asm/pgtable-bits.h
> > b/arch/loongarch/include/asm/pgtable-bits.h
> > index 35348d4c4209a..a3d827701736d 100644
> > --- a/arch/loongarch/include/asm/pgtable-bits.h
> > +++ b/arch/loongarch/include/asm/pgtable-bits.h
> > @@ -92,6 +92,8 @@
> >  
> >   #ifndef __ASSEMBLY__
> >  
> > +extern bool wc_enabled;
> > +
> >   #define _PAGE_IOREMAP         pgprot_val(PAGE_KERNEL_SUC)
> >  
> >   #define pgprot_noncached pgprot_noncached
> > @@ -111,7 +113,7 @@ static inline pgprot_t
> > pgprot_writecombine(pgprot_t _prot)
> >   {
> >         unsigned long prot = pgprot_val(_prot);
> >  
> > -       prot = (prot & ~_CACHE_MASK) | _CACHE_WUC;
> > +       prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC :
> > _CACHE_SUC);
> >  
> >         return __pgprot(prot);
> >   }
> > diff --git a/arch/loongarch/kernel/setup.c
> > b/arch/loongarch/kernel/setup.c
> > index 7783f0a3d742c..465c1dbb6f4b4 100644
> > --- a/arch/loongarch/kernel/setup.c
> > +++ b/arch/loongarch/kernel/setup.c
> > @@ -161,19 +161,19 @@ static void __init smbios_parse(void)
> >   }
> >  
> >   #ifdef CONFIG_ARCH_WRITECOMBINE
> > -pgprot_t pgprot_wc = PAGE_KERNEL_WUC;
> > +bool wc_enabled = true;
> >   #else
> > -pgprot_t pgprot_wc = PAGE_KERNEL_SUC;
> > +bool wc_enabled;
> >   #endif
> >  
> > -EXPORT_SYMBOL(pgprot_wc);
> > +EXPORT_SYMBOL(wc_enabled);
> >  
> >   static int __init setup_writecombine(char *p)
> >   {
> >         if (!strcmp(p, "on"))
> > -               pgprot_wc = PAGE_KERNEL_WUC;
> > +               wc_enabled = true;
> >         else if (!strcmp(p, "off"))
> > -               pgprot_wc = PAGE_KERNEL_SUC;
> > +               wc_enabled = false;
> >         else
> >                 pr_warn("Unknown writecombine setting \"%s\".\n",
> > p);
> >  
>
>
> Good catch!
>
> But this will make the write combine(WC) mappings completely unusable
> on LoongArch.

Not completely unusable -- we always have the writecombine=on kernel
parameter that overrides it (or CONFIG_ARCH_WRITECOMBINE build-time
option).

People knowing what they're doing can utilize them to gain back the
performace of WUC, however for the default setup I prefer functionality
correctness to (unstable) performance.

> This is nearly equivalent to say that LoongArch don't support write
> combine at all.
> But the write combine(WC) mappings works fine for software based drm
> drivers,
> such as drm/loongson and drm/ast etc. Even include drm/radeon and
> drm/amdgpu with
> pure software rendering setup (by putting Option "Accel" "off" into
> 10-amdgpu.conf
> or 10-radeon.conf) After merge this patch, the performance drop
> dramatically for
> 2D software rendering based display controller drivers.
>
> Well, this patch itself is a good catch, as it is a fix for the
> commit <16c52e503043>
> ("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm
> afraid that
> both of this commit and the <16c52e503043> commit are not a *real*
> fix write combine

Yes, this is a workaround rather than a fix, however I'm a software
developer and this is the best I can do.

If you can, please report this to your hardware developers and try to
get it fixed for 3A7000/7A3000 (or maybe grabbing some chicken bit that
will fix this for existing chips). I will be quite appreciated.

Thanks,
Icenowy

> related issue on LoongArch. It just negative sidestep of the real
> problem.
>