Re: [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes() loop
From: Anshuman Khandual
Date: Fri Feb 07 2025 - 00:36:01 EST
On 2/5/25 20:39, Ryan Roberts wrote:
> ___set_ptes() previously called __set_pte() for each PTE in the range,
> which would conditionally issue a DSB and ISB to make the new PTE value
> immediately visible to the table walker if the new PTE was valid and for
> kernel space.
>
> We can do better than this; let's hoist those barriers out of the loop
> so that they are only issued once at the end of the loop. We then reduce
> the cost by the number of PTEs in the range.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
> ---
> arch/arm64/include/asm/pgtable.h | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 3b55d9a15f05..1d428e9c0e5a 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
> WRITE_ONCE(*ptep, pte);
> }
>
> -static inline void __set_pte(pte_t *ptep, pte_t pte)
> +static inline void __set_pte_complete(pte_t pte)
> {
> - __set_pte_nosync(ptep, pte);
> -
> /*
> * Only if the new pte is valid and kernel, otherwise TLB maintenance
> * or update_mmu_cache() have the necessary barriers.
> @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte)
> }
> }
>
> +static inline void __set_pte(pte_t *ptep, pte_t pte)
> +{
> + __set_pte_nosync(ptep, pte);
> + __set_pte_complete(pte);
> +}
> +
> static inline pte_t __ptep_get(pte_t *ptep)
> {
> return READ_ONCE(*ptep);
> @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pte,
>
> for (;;) {
> __check_safe_pte_update(mm, ptep, pte);
> - __set_pte(ptep, pte);
> + __set_pte_nosync(ptep, pte);
> if (--nr == 0)
> break;
> ptep++;
> pte = pte_advance_pfn(pte, stride);
> }
> +
> + __set_pte_complete(pte);
Given that the loop now iterates over number of page table entries without corresponding
consecutive dsb/isb sync, could there be a situation where something else gets scheduled
on the cpu before __set_pte_complete() is called ? Hence leaving the entire page table
entries block without desired mapping effect. IOW how __set_pte_complete() is ensured to
execute once the loop above completes. Otherwise this change LGTM.
> }
>
> static inline void __set_ptes(struct mm_struct *mm,