Re: [PATCH 4/6] riscv: mm: pass noncoherent or not to riscv_noncoherent_supported()

From: Conor Dooley
Date: Wed May 31 2023 - 12:29:30 EST


On Wed, May 31, 2023 at 11:28:22PM +0800, Jisheng Zhang wrote:
> On Wed, May 31, 2023 at 11:24:19PM +0800, Jisheng Zhang wrote:
> > On Mon, May 29, 2023 at 12:13:10PM +0100, Conor Dooley wrote:
> > > On Sat, May 27, 2023 at 12:59:56AM +0800, Jisheng Zhang wrote:
> > > > We will soon take different actions by checking the HW is noncoherent
> > > > or not, I.E ZICBOM/ERRATA_THEAD_CMO or not.
> > > >
> > > > Signed-off-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> > > > ---
> > > > arch/riscv/errata/thead/errata.c | 19 +++++++++++--------
> > > > arch/riscv/include/asm/cacheflush.h | 4 ++--
> > > > arch/riscv/kernel/setup.c | 6 +++++-
> > > > arch/riscv/mm/dma-noncoherent.c | 10 ++++++----
> > > > 4 files changed, 24 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> > > > index be84b14f0118..c192b80a5166 100644
> > > > --- a/arch/riscv/errata/thead/errata.c
> > > > +++ b/arch/riscv/errata/thead/errata.c
> > > > @@ -36,21 +36,24 @@ static bool errata_probe_pbmt(unsigned int stage,
> > > > static bool errata_probe_cmo(unsigned int stage,
> > > > unsigned long arch_id, unsigned long impid)
> > > > {
> > > > - if (!IS_ENABLED(CONFIG_ERRATA_THEAD_CMO))
> > > > - return false;
> > > > -
> > > > - if (arch_id != 0 || impid != 0)
> > > > - return false;
> > > > + bool cmo;
> > > >
> > > > if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> > > > return false;
> > > >
> > > > + if (IS_ENABLED(CONFIG_ERRATA_THEAD_CMO) &&
> > > > + (arch_id == 0 && impid == 0))
> > > > + cmo = true;
> > > > + else
> > > > + cmo = false;
> > > > +
> > > > if (stage == RISCV_ALTERNATIVES_BOOT) {
> > > > - riscv_cbom_block_size = L1_CACHE_BYTES;
> > > > - riscv_noncoherent_supported();
> > > > + if (cmo)
> > > > + riscv_cbom_block_size = L1_CACHE_BYTES;
> > > > + riscv_noncoherent_supported(cmo);
> > > > }
> > > >
> > > > - return true;
> > > > + return cmo;
> > >
> > > I don't really understand the changes that you are making to this
> > > function, so that is tries really hard to call
> > > riscv_noncoherent_supported(). Why do we need to always call the function
> > > in the erratum's probe function, if the erratum is not detected, given
> >
> > In one unified kernel Image, to support both coherent and noncoherent
> > platforms(currently, either T-HEAD CMO or ZICBOM), we need to let the
> > kmalloc meet both cases, specifically, ARCH_DMA_MINALIGN aligned.
>
> seems adding three words can make it better:
>
> kmalloc meet both cases at the beginning, specifically ...
>
> > Once we know the underlying HW is coherent, I.E neither T-HEAD CMO nor
> > ZICBOM, we need to notice kmalloc we are safe to reduce the alignment
> > to 1. The notice action is done in patch 5:
> >
> > + } else {
> > + dma_cache_alignment = 1;
> >
> >
> > > that riscv_noncoherent_supported() is called immediately after
> > > apply_boot_alternatives() in setup_arch()?

This bit here is the key part of my confusion. You try really hard in
the errata stuff to call riscv_noncoherent_supported(), which I do
understand is because of the other branch that you add to the function
later in the series.

What I do not understand is why we are not able to rely on the call to
it in setup_arch() to trigger it when we do not have T-HEAD CMOs or
Zicbom.
You've explained why you want to make sure it always gets called during
boot, but my question is about why it looks like it is being called more
than once.

Actually, now that I think of it, what happens on a T-HEAD system where
there is no T-HEAD CMOs, but there is Zicbom. In theory, this could
exist.
Bear with me here a moment in case I am completely wrong, snippet is
from setup_arch()
apply_boot_alternatives();
On my example system, this will trigger, eventually sending us into
errata_probe_cmo(), where we will call riscv_noncoherent_supported()
with false, setting dma_cache_alignment to 1.

if (IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM) &&
riscv_isa_extension_available(NULL, ZICBOM))
cmo = true;

On this system, this will be true.

else
cmo = false;
riscv_noncoherent_supported(cmo);

now riscv_noncoherent_supported() is called with true, and we have
dma_cache_alignment = 1 still. Is that not problematic? Or the inverse,
where the T-HEAD system has its custom CMOs and there is no Zicbom, it
gets called twice with different args too.

There's clearly something fundamental that I am missing here, this seems
like it should be immediately obvious why this either cannot happen or
is not a problem, but I can't see it.

Sorry,
Conor.

Attachment: signature.asc
Description: PGP signature