Re: [PATCH v3 11/13] arm64: mm: More flags for __flush_tlb_range()

From: Jonathan Cameron

Date: Tue Mar 03 2026 - 05:16:36 EST


On Mon, 2 Mar 2026 13:55:58 +0000
Ryan Roberts <ryan.roberts@xxxxxxx> wrote:

> Refactor function variants with "_nosync", "_local" and "_nonotify" into
> a single __always_inline implementation that takes flags and rely on
> constant folding to select the parts that are actually needed at any
> given callsite, based on the provided flags.
>
> Flags all live in the tlbf_t (TLB flags) type; TLBF_NONE (0) continues
> to provide the strongest semantics (i.e. evict from walk cache,
> broadcast, synchronise and notify). Each flag reduces the strength in
> some way; TLBF_NONOTIFY, TLBF_NOSYNC and TLBF_NOBROADCAST are added to
> complement the existing TLBF_NOWALKCACHE.
>
> There are no users that require TLBF_NOBROADCAST without
> TLBF_NOWALKCACHE so implement that as BUILD_BUG() to avoid needing to
> introduce dead code for vae1 invalidations.
>
> The result is a clearer, simpler, more powerful API.
Hi Ryan,

There is one subtle change to rounding that should be called out at least.

Might even be worth pulling it to a precursor patch where you can add an
explanation of why original code was rounding to a larger value than was
ever needed.

Jonathan


>
> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>


> static inline void __flush_tlb_range(struct vm_area_struct *vma,
> @@ -586,24 +615,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
> unsigned long stride, int tlb_level,
> tlbf_t flags)
> {
> - __flush_tlb_range_nosync(vma->vm_mm, start, end, stride,
> - tlb_level, flags);
> - __tlbi_sync_s1ish();
> -}
> -
> -static inline void local_flush_tlb_contpte(struct vm_area_struct *vma,
> - unsigned long addr)
> -{
> - unsigned long asid;
> -
> - addr = round_down(addr, CONT_PTE_SIZE);
See below.
> -
> - dsb(nshst);
> - asid = ASID(vma->vm_mm);
> - __flush_s1_tlb_range_op(vale1, addr, CONT_PTES, PAGE_SIZE, asid, 3);
> - mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, addr,
> - addr + CONT_PTE_SIZE);
> - dsb(nsh);
> + start = round_down(start, stride);
See below.
> + end = round_up(end, stride);
> + __do_flush_tlb_range(vma, start, end, stride, tlb_level, flags);
> }

>
> static inline bool __pte_flags_need_flush(ptdesc_t oldval, ptdesc_t newval)
> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> index 681f22fac52a1..3f1a3e86353de 100644
> --- a/arch/arm64/mm/contpte.c
> +++ b/arch/arm64/mm/contpte.c
...

> @@ -641,7 +641,10 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> __ptep_set_access_flags(vma, addr, ptep, entry, 0);
>
> if (dirty)
> - local_flush_tlb_contpte(vma, start_addr);
> + __flush_tlb_range(vma, start_addr,
> + start_addr + CONT_PTE_SIZE,
> + PAGE_SIZE, 3,

This results in a different stride to round down.
local_flush_tlb_contpte() did
addr = round_down(addr, CONT_PTE_SIZE);

With this call we have
start = round_down(start, stride); where stride is PAGE_SIZE.

I'm too lazy to figure out if that matters.


> + TLBF_NOWALKCACHE | TLBF_NOBROADCAST);
> } else {
> __contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte);
> __ptep_set_access_flags(vma, addr, ptep, entry, dirty);