[PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables

From: Ryan Roberts
Date: Thu Apr 04 2024 - 10:33:48 EST


After removing uneccessary TLBIs, the next bottleneck when creating the
page tables for the linear map is DSB and ISB, which were previously
issued per-pte in __set_pte(). Since we are writing multiple ptes in a
given pte table, we can elide these barriers and insert them once we
have finished writing to the table.

Execution time of map_mem(), which creates the kernel linear map page
tables, was measured on different machines with different RAM configs:

| Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra
| VM, 16G | VM, 64G | VM, 256G | Metal, 512G
---------------|-------------|-------------|-------------|-------------
| ms (%) | ms (%) | ms (%) | ms (%)
---------------|-------------|-------------|-------------|-------------
before | 77 (0%) | 431 (0%) | 1727 (0%) | 3796 (0%)
after | 13 (-84%) | 162 (-62%) | 655 (-62%) | 1656 (-56%)

Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
Tested-by: Itaru Kitayama <itaru.kitayama@xxxxxxxxxxx>
Tested-by: Eric Chanudet <echanude@xxxxxxxxxx>
---
arch/arm64/include/asm/pgtable.h | 7 ++++++-
arch/arm64/mm/mmu.c | 13 ++++++++++++-
2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..105a95a8845c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -271,9 +271,14 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
}

-static inline void __set_pte(pte_t *ptep, pte_t pte)
+static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
+}
+
+static inline void __set_pte(pte_t *ptep, pte_t pte)
+{
+ __set_pte_nosync(ptep, pte);

/*
* Only if the new pte is valid and kernel, otherwise TLB maintenance
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index fd91b5bdb514..dc86dceb0efe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -178,7 +178,11 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
do {
pte_t old_pte = __ptep_get(ptep);

- __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
+ /*
+ * Required barriers to make this visible to the table walker
+ * are deferred to the end of alloc_init_cont_pte().
+ */
+ __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot));

/*
* After the PTE entry has been populated once, we
@@ -234,6 +238,13 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
} while (addr = next, addr != end);

pte_clear_fixmap();
+
+ /*
+ * Ensure all previous pgtable writes are visible to the table walker.
+ * See init_pte().
+ */
+ dsb(ishst);
+ isb();
}

static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
--
2.25.1