Re: [PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP
From: Ryan Roberts
Date: Wed Jan 31 2024 - 05:32:10 EST
On 31/01/2024 10:21, David Hildenbrand wrote:
>
>>> +
>>> +#ifndef clear_full_ptes
>>> +/**
>>> + * clear_full_ptes - Clear PTEs that map consecutive pages of the same folio.
>>
>> I know its implied from "pages of the same folio" (and even more so for the
>> above variant due to mention of access/dirty), but I wonder if its useful to
>> explicitly state that "all ptes being cleared are present at the time of the
>> call"?
>
> "Clear PTEs" -> "Clear present PTEs" ?
>
> That should make it clearer.
Works for me.
>
> [...]
>
>>> if (!delay_rmap) {
>>> - folio_remove_rmap_pte(folio, page, vma);
>>> + folio_remove_rmap_ptes(folio, page, nr, vma);
>>> +
>>> + /* Only sanity-check the first page in a batch. */
>>> if (unlikely(page_mapcount(page) < 0))
>>> print_bad_pte(vma, addr, ptent, page);
>>
>> Is there a case for either removing this all together or moving it into
>> folio_remove_rmap_ptes()? It seems odd to only check some pages.
>>
>
> I really wanted to avoid another nasty loop here.
>
> In my thinking, for 4k folios, or when zapping subpages of large folios, we
> still perform the exact same checks. Only when batching we don't. So if there is
> some problem, there are ways to get it triggered. And these problems are barely
> ever seen.
>
> folio_remove_rmap_ptes() feels like the better place -- especially because the
> delayed-rmap handling is effectively unchecked. But in there, we cannot
> "print_bad_pte()".
>
> [background: if we had a total mapcount -- iow cheap folio_mapcount(), I'd check
> here that the total mapcount does not underflow, instead of checking per-subpage]
All good points... perhaps extend the comment to describe how this could be
solved in future with cheap total_mapcount()? Or in the commit log if you prefer?
>
>>
>>> }
>>> - if (unlikely(__tlb_remove_page(tlb, page, delay_rmap))) {
>>> + if (unlikely(__tlb_remove_folio_pages(tlb, page, nr, delay_rmap))) {
>>> *force_flush = true;
>>> *force_break = true;
>>> }
>>> }
>>> -static inline void zap_present_pte(struct mmu_gather *tlb,
>>> +/*
>>> + * Zap or skip one present PTE, trying to batch-process subsequent PTEs that
>>> map
>>
>> Zap or skip *at least* one... ?
>
> Ack
>