Re: [External] [PATCH RFC/RFT v2 4/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings with Svvptc

From: yunhui cui
Date: Thu May 30 2024 - 05:36:12 EST


Hi Alex,

On Thu, Feb 1, 2024 at 12:04 AM Alexandre Ghiti <alexghiti@rivosinccom> wrote:
>
> The preventive sfence.vma were emitted because new mappings must be made
> visible to the page table walker but Svvptc guarantees that xRET act as
> a fence, so no need to sfence.vma for the uarchs that implement this
> extension.
>
> This allows to drastically reduce the number of sfence.vma emitted:
>
> * Ubuntu boot to login:
> Before: ~630k sfence.vma
> After: ~200k sfence.vma
>
> * ltp - mmapstress01
> Before: ~45k
> After: ~6.3k
>
> * lmbench - lat_pagefault
> Before: ~665k
> After: 832 (!)
>
> * lmbench - lat_mmap
> Before: ~546k
> After: 718 (!)
>
> Signed-off-by: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx>
> ---
> arch/riscv/include/asm/pgtable.h | 16 +++++++++++++++-
> arch/riscv/mm/pgtable.c | 13 +++++++++++++
> 2 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 0c94260b5d0c..50986e4c4601 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -473,6 +473,9 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
> struct vm_area_struct *vma, unsigned long address,
> pte_t *ptep, unsigned int nr)
> {
> + asm_volatile_goto(ALTERNATIVE("nop", "j %l[svvptc]", 0, RISCV_ISA_EXT_SVVPTC, 1)
> + : : : : svvptc);
> +
> /*
> * The kernel assumes that TLBs don't cache invalid entries, but
> * in RISC-V, SFENCE.VMA specifies an ordering constraint, not a
> @@ -482,12 +485,23 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf,
> */
> while (nr--)
> local_flush_tlb_page(address + nr * PAGE_SIZE);
> +
> +svvptc:
> + /*
> + * Svvptc guarantees that xRET act as a fence, so when the uarch does
> + * not cache invalid entries, we don't have to do anything.
> + */
> + ;
> }

>From the perspective of RISC-V arch, the logic of this patch is
reasonable. The code of mm comm submodule may be missing
update_mmu_cache_range(), for example: there is no flush TLB in
remap_pte_range() after updating pte.
I will send a patch to mm/ to fix this problem next.


Thanks,
Yunhui