Re: [PATCH 4/6] riscv: mm: pass noncoherent or not to riscv_noncoherent_supported()

From: Jisheng Zhang
Date: Wed May 31 2023 - 23:52:17 EST


On Wed, May 31, 2023 at 05:28:56PM +0100, Conor Dooley wrote:
> On Wed, May 31, 2023 at 11:28:22PM +0800, Jisheng Zhang wrote:
> > On Wed, May 31, 2023 at 11:24:19PM +0800, Jisheng Zhang wrote:
> > > On Mon, May 29, 2023 at 12:13:10PM +0100, Conor Dooley wrote:
> > > > On Sat, May 27, 2023 at 12:59:56AM +0800, Jisheng Zhang wrote:
> > > > > We will soon take different actions by checking the HW is noncoherent
> > > > > or not, I.E ZICBOM/ERRATA_THEAD_CMO or not.
> > > > >
> > > > > Signed-off-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> > > > > ---
> > > > > arch/riscv/errata/thead/errata.c | 19 +++++++++++--------
> > > > > arch/riscv/include/asm/cacheflush.h | 4 ++--
> > > > > arch/riscv/kernel/setup.c | 6 +++++-
> > > > > arch/riscv/mm/dma-noncoherent.c | 10 ++++++----
> > > > > 4 files changed, 24 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> > > > > index be84b14f0118..c192b80a5166 100644
> > > > > --- a/arch/riscv/errata/thead/errata.c
> > > > > +++ b/arch/riscv/errata/thead/errata.c
> > > > > @@ -36,21 +36,24 @@ static bool errata_probe_pbmt(unsigned int stage,
> > > > > static bool errata_probe_cmo(unsigned int stage,
> > > > > unsigned long arch_id, unsigned long impid)
> > > > > {
> > > > > - if (!IS_ENABLED(CONFIG_ERRATA_THEAD_CMO))
> > > > > - return false;
> > > > > -
> > > > > - if (arch_id != 0 || impid != 0)
> > > > > - return false;
> > > > > + bool cmo;
> > > > >
> > > > > if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > > > > return false;
> > > > >
> > > > > + if (IS_ENABLED(CONFIG_ERRATA_THEAD_CMO) &&
> > > > > + (arch_id == 0 && impid == 0))
> > > > > + cmo = true;
> > > > > + else
> > > > > + cmo = false;
> > > > > +
> > > > > if (stage == RISCV_ALTERNATIVES_BOOT) {
> > > > > - riscv_cbom_block_size = L1_CACHE_BYTES;
> > > > > - riscv_noncoherent_supported();
> > > > > + if (cmo)
> > > > > + riscv_cbom_block_size = L1_CACHE_BYTES;
> > > > > + riscv_noncoherent_supported(cmo);
> > > > > }
> > > > >
> > > > > - return true;
> > > > > + return cmo;
> > > >
> > > > I don't really understand the changes that you are making to this
> > > > function, so that is tries really hard to call
> > > > riscv_noncoherent_supported(). Why do we need to always call the function
> > > > in the erratum's probe function, if the erratum is not detected, given
> > >
> > > In one unified kernel Image, to support both coherent and noncoherent
> > > platforms(currently, either T-HEAD CMO or ZICBOM), we need to let the
> > > kmalloc meet both cases, specifically, ARCH_DMA_MINALIGN aligned.
> >
> > seems adding three words can make it better:
> >
> > kmalloc meet both cases at the beginning, specifically ...
> >
> > > Once we know the underlying HW is coherent, I.E neither T-HEAD CMO nor
> > > ZICBOM, we need to notice kmalloc we are safe to reduce the alignment
> > > to 1. The notice action is done in patch 5:
> > >
> > > + } else {
> > > + dma_cache_alignment = 1;
> > >
> > >
> > > > that riscv_noncoherent_supported() is called immediately after
> > > > apply_boot_alternatives() in setup_arch()?
>
> This bit here is the key part of my confusion. You try really hard in
> the errata stuff to call riscv_noncoherent_supported(), which I do
> understand is because of the other branch that you add to the function
> later in the series.
>
> What I do not understand is why we are not able to rely on the call to
> it in setup_arch() to trigger it when we do not have T-HEAD CMOs or
> Zicbom.
> You've explained why you want to make sure it always gets called during
> boot, but my question is about why it looks like it is being called more
> than once.
>
> Actually, now that I think of it, what happens on a T-HEAD system where
> there is no T-HEAD CMOs, but there is Zicbom. In theory, this could
> exist.
> Bear with me here a moment in case I am completely wrong, snippet is
> from setup_arch()
> apply_boot_alternatives();
> On my example system, this will trigger, eventually sending us into
> errata_probe_cmo(), where we will call riscv_noncoherent_supported()
> with false, setting dma_cache_alignment to 1.
>
> if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) &&
> riscv_isa_extension_available(NULL, ZICBOM))
> cmo = true;
>
> On this system, this will be true.
>
> else
> cmo = false;
> riscv_noncoherent_supported(cmo);
>
> now riscv_noncoherent_supported() is called with true, and we have
> dma_cache_alignment = 1 still. Is that not problematic? Or the inverse,
> where the T-HEAD system has its custom CMOs and there is no Zicbom, it
> gets called twice with different args too.
>

Thank you Conor. You pointed out a bug in my series. It looks like we
need to defer the dma_cache_alignment modification a bit until T-HEAD
CMO probing is done, but we also need to think carefully about T-HEAD
related CONFIG option is disabled. I will take care this case in v2
once Catalin's series is merged.