Re: [PATCH v8] riscv: mm: Add support for Svinval extension

From: Mayuresh Chitale
Date: Tue Jul 30 2024 - 04:44:51 EST


On Wed, Jul 24, 2024 at 8:20 PM Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote:
>
> On Tue, 02 Jul 2024 03:26:37 PDT (-0700), mchitale@xxxxxxxxxxxxxxxx wrote:
> > The Svinval extension splits SFENCE.VMA instruction into finer-grained
> > invalidation and ordering operations and is mandatory for RVA23S64 profile.
> > When Svinval is enabled the local_flush_tlb_range_threshold_asid function
> > should use the following sequence to optimize the tlb flushes instead of
>
> Do you have any performance numbers for the optimization? As per here
> <https://lore.kernel.org/all/mhng-f799bd2b-7f22-4c03-bdb2-903fa3b5d508@palmer-ri-x1c9a/>.

No, currently there are no numbers available for comparison but the
rationale for the optimization is described in the spec. The extension
is mandatory for the RVA23S64 profile but any platform that doesn't
support this extension will not be impacted as the code executes only
if the svinval extension is enabled at the boot up.
>
> > a simple sfence.vma:
> >
> > sfence.w.inval
> > svinval.vma
> > .
> > .
> > svinval.vma
> > sfence.inval.ir
> >
> > The maximum number of consecutive svinval.vma instructions that
> > can be executed in local_flush_tlb_range_threshold_asid function
> > is limited to 64. This is required to avoid soft lockups and the
> > approach is similar to that used in arm64.
> >
> > Signed-off-by: Mayuresh Chitale <mchitale@xxxxxxxxxxxxxxxx>
> > Reviewed-by: Andrew Jones <ajones@xxxxxxxxxxxxxxxx>
> > ---
> > Changes in v8:
> > - Fix line wrap
> > - Add RB tag
> >
> > Changes in v7:
> > - Use existing svinval macros in the insn-def.h
> > - Rename local_sinval_vma_asid to local_sinval_vma
> >
> > Changes in v6:
> > - Rebase on latest torvalds/master
> >
> > Changes in v5:
> > - Reduce tlb flush threshold to 64
> > - Improve implementation of local_flush_tlb* functions
> >
> > Changes in v4:
> > - Rebase and refactor as per latest changes on torvalds/master
> > - Drop patch 1 in the series
> >
> > Changes in v3:
> > - Fix incorrect vma used for sinval instructions
> > - Use unified static key mechanism for svinval
> > - Rebased on torvalds/master
> >
> > Changes in v2:
> > - Rebased on 5.18-rc3
> > - update riscv_fill_hwcap to probe Svinval extension
> >
> > arch/riscv/mm/tlbflush.c | 32 ++++++++++++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> >
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index 9b6e86ce3867..782147a63f3b 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -6,6 +6,27 @@
> > #include <linux/hugetlb.h>
> > #include <asm/sbi.h>
> > #include <asm/mmu_context.h>
> > +#include <asm/cpufeature.h>
> > +
> > +#define has_svinval() riscv_has_extension_unlikely(RISCV_ISA_EXT_SVINVAL)
> > +
> > +static inline void local_sfence_inval_ir(void)
> > +{
> > + asm volatile(SFENCE_INVAL_IR() ::: "memory");
> > +}
> > +
> > +static inline void local_sfence_w_inval(void)
> > +{
> > + asm volatile(SFENCE_W_INVAL() ::: "memory");
> > +}
> > +
> > +static inline void local_sinval_vma(unsigned long vma, unsigned long asid)
> > +{
> > + if (asid != FLUSH_TLB_NO_ASID)
> > + asm volatile(SINVAL_VMA(%0, %1) : : "r" (vma), "r" (asid) : "memory");
> > + else
> > + asm volatile(SINVAL_VMA(%0, zero) : : "r" (vma) : "memory");
> > +}
> >
> > /*
> > * Flush entire TLB if number of entries to be flushed is greater
> > @@ -26,6 +47,16 @@ static void local_flush_tlb_range_threshold_asid(unsigned long start,
> > return;
> > }
> >
> > + if (has_svinval()) {
> > + local_sfence_w_inval();
> > + for (i = 0; i < nr_ptes_in_range; ++i) {
> > + local_sinval_vma(start, asid);
> > + start += stride;
> > + }
> > + local_sfence_inval_ir();
> > + return;
> > + }
> > +
> > for (i = 0; i < nr_ptes_in_range; ++i) {
> > local_flush_tlb_page_asid(start, asid);
> > start += stride;