Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup
From: Joao Martins
Date: Tue Sep 19 2023 - 04:19:41 EST
On 19/09/2023 07:27, Muchun Song wrote:
> On 2023/9/19 07:01, Mike Kravetz wrote:
>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
>>
>> In an effort to minimize amount of TLB flushes, batch all PMD splits
>> belonging to a range of pages in order to perform only 1 (global) TLB
>> flush.
>>
>> Add a flags field to the walker and pass whether it's a bulk allocation
>> or just a single page to decide to remap. First value
>> (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the TLB
>> flush when we split the PMD.
>>
>> Rebased and updated by Mike Kravetz
>>
>> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>> ---
>> mm/hugetlb_vmemmap.c | 79 +++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 75 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 147ed15bcae4..e8bc2f7567db 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -27,6 +27,7 @@
>> * @reuse_addr: the virtual address of the @reuse_page page.
>> * @vmemmap_pages: the list head of the vmemmap pages that can be freed
>> * or is mapped from.
>> + * @flags: used to modify behavior in bulk operations
>> */
>> struct vmemmap_remap_walk {
>> void (*remap_pte)(pte_t *pte, unsigned long addr,
>> @@ -35,9 +36,11 @@ struct vmemmap_remap_walk {
>> struct page *reuse_page;
>> unsigned long reuse_addr;
>> struct list_head *vmemmap_pages;
>> +#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0)
>
> Please add a brief comment following this macro to explain what's the
> behavior.
>
/* Skip the TLB flush when we split the PMD */
And will also do it in the next patch with:
/* Skip the TLB flush when we remap the PTE */
>> + unsigned long flags;
>> };
>> -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
>> +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, bool flush)
>> {
>> pmd_t __pmd;
>> int i;
>> @@ -80,7 +83,8 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long
>> start)
>> /* Make pte visible before pmd. See comment in pmd_install(). */
>> smp_wmb();
>> pmd_populate_kernel(&init_mm, pmd, pgtable);
>> - flush_tlb_kernel_range(start, start + PMD_SIZE);
>> + if (flush)
>> + flush_tlb_kernel_range(start, start + PMD_SIZE);
>> } else {
>> pte_free_kernel(&init_mm, pgtable);
>> }
>> @@ -127,11 +131,20 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long
>> addr,
>> do {
>> int ret;
>> - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK);
>> + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK,
>> + walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH);
>
> !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)?
>
Yeah -- Gah, I must be very distracted.
Thanks