Re: [PATCH V3 1/2] arm64/mm: Enable batched TLB flush in unmap_hotplug_range()

From: Anshuman Khandual

Date: Tue Mar 03 2026 - 00:55:41 EST




On 02/03/26 8:58 PM, David Hildenbrand (Arm) wrote:
> On 2/24/26 07:24, Anshuman Khandual wrote:
>> During a memory hot remove operation, both linear and vmemmap mappings for
>> the memory range being removed, get unmapped via unmap_hotplug_range() but
>> mapped pages get freed only for vmemmap mapping. This is just a sequential
>> operation where each table entry gets cleared, followed by a leaf specific
>> TLB flush, and then followed by memory free operation when applicable.
>>
>> This approach was simple and uniform both for vmemmap and linear mappings.
>> But linear mapping might contain CONT marked block memory where it becomes
>> necessary to first clear out all entire in the range before a TLB flush.
>> This is as per the architecture requirement. Hence batch all TLB flushes
>> during the table tear down walk and finally do it in unmap_hotplug_range().
>>
>> Prior to this fix, it was hypothetically possible for a speculative access
>> to a higher address in the contiguous block to fill the TLB with shattered
>> entries for the entire contiguous range after a lower address had already
>> been cleared and invalidated. Due to the table entries being shattered, the
>> subsequent TLB invalidation for the higher address would not then clear the
>> TLB entries for the lower address, meaning stale TLB entries could persist.
>>
>> Besides it also helps in improving the performance via TLBI range operation
>> along with reduced synchronization instructions. The time spent executing
>> unmap_hotplug_range() improved 97% measured over a 2GB memory hot removal
>> in KVM guest.
>>
>> This scheme is not applicable during vmemmap mapping tear down where memory
>> needs to be freed and hence a TLB flush is required after clearing out page
>> table entry.
>>
>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>> Cc: Will Deacon <will@xxxxxxxxxx>
>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> Closes: https://lore.kernel.org/all/aWZYXhrT6D2M-7-N@willie-the-truck/
>> Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove")
>> Cc: stable@xxxxxxxxxxxxxxx
>> Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx>
>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>> ---
>> arch/arm64/mm/mmu.c | 81 +++++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 67 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index a6a00accf4f9..dfb61d218579 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1458,10 +1458,32 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
>>
>> WARN_ON(!pte_present(pte));
>> __pte_clear(&init_mm, addr, ptep);
>> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>> - if (free_mapped)
>> + if (free_mapped) {
>> + /*
>> + * If page is part of an existing contiguous
>> + * memory block, individual TLB invalidation
>> + * here would not be appropriate. Instead it
>> + * will require clearing all entries for the
>> + * memory block and subsequently a TLB flush
>> + * for the entire range.
>> + */
>
> I'm not sure about repeating these longish comments a couple of times :)
>
> For example, I think you can drop the ones regarding the TLB flush
> ("TLB flush is batched in unmap_hotplug_range ...") completely.
> unmap_hotplug_range(), the only caller, is pretty clear about that. And
> anybody reading that code should be able to spot the "!free_mapped" case
> easily.
>
> Alternatively, just say
>
> /* unmap_hotplug_range() flushes TLB for !free_mapped */

Will replace as suggested.

>
> "TLB flush is essential for freeing memory." is rather obvious when
> freeing memory, so I would drop that as well.
>
> Regarding pte_cont(), can we shorten that to
>
> /* CONT blocks in the vmemmap are not supported. */
>
> Anybody who wants to figure *why* can lookup your patch where you add
> that comment+check.

Sure will drop the comment about CONT mapping TLB flush and mention that
such mappings are not supported for vmemmap.

>
>
> I did not check whether people suggested to add these comments in
> previous versions. But to me they don't add a lot of real value that
> couldn't be had from the code already (or common sense: freeing requires
> prior TLB flush).

Dropped all these comments - "TLB flush is essential for freeing memory".

>
>> + WARN_ON(pte_cont(pte));
>> +
>> + /*
>> + * TLB flush is essential for freeing memory.
>> + */
>> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>> free_hotplug_page_range(pte_page(pte),
>> PAGE_SIZE, altmap);
>> + }
>
> [...]
>
>> WARN_ON(!pud_table(pud));
>> @@ -1553,6 +1597,7 @@ static void unmap_hotplug_p4d_range(pgd_t *pgdp, unsigned long addr,
>> static void unmap_hotplug_range(unsigned long addr, unsigned long end,
>> bool free_mapped, struct vmem_altmap *altmap)
>> {
>> + unsigned long start = addr;
>> unsigned long next;
>> pgd_t *pgdp, pgd;
>>
>> @@ -1574,6 +1619,14 @@ static void unmap_hotplug_range(unsigned long addr, unsigned long end,
>> WARN_ON(!pgd_present(pgd));
>> unmap_hotplug_p4d_range(pgdp, addr, next, free_mapped, altmap);
>> } while (addr = next, addr < end);
>> +
>> + /*
>> + * Batched TLB flush only for linear mapping which
>> + * might contain CONT blocks, and does not require
>> + * freeing up memory as well.
>> + */
>
> Also, here, I don't think the comment really adds value.
>
> * !free_mapped -> linear mapping, no freeing of memory
> * CONT blocks -> irrelevant, you can batch in either case

Dropped the comment.

>
>> + if (!free_mapped)
>> + flush_tlb_kernel_range(start, end);
>> }