Re: [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()

From: Zi Yan

Date: Tue Apr 14 2026 - 11:56:32 EST

On 14 Apr 2026, at 6:38, David Hildenbrand (Arm) wrote:

> On 4/13/26 21:20, Zi Yan wrote:
>> This check ensures the correctness of collapse read-only THPs for FSes
>> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>> PMD THP pagecache.
>>
>> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>> and inode->i_writecount to prevent any write to read-only to-be-collapsed
>> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>> aforementioned mechanism will go away too. To ensure khugepaged functions
>> as expected after the changes, rollback if any folio is dirty after
>> try_to_unmap_flush() to , since a dirty folio means this read-only folio
>> got some writes via mmap can happen between try_to_unmap() and
>> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>> collapse writable pagecache folios.
>>
>> Signed-off-by: Zi Yan <ziy@xxxxxxxxxx>
>> ---
>> mm/khugepaged.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d2f0acd2dac2..ec609e53082e 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2121,6 +2121,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>> */
>> try_to_unmap_flush();
>>
>> + /*
>> + * At this point, all folios are locked, unmapped, and all cached
>> + * mappings in TLBs are flushed. No one else is able to write to these
>> + * folios, since
>> + * 1. writes via FS ops require folio locks (see write_begin_get_folio());
>> + * 2. writes via mmap require taking a fault and locking folio locks.
>> + *
>
> maybe simplify to "folios, since that would require taking the folio lock first."

Sure.

>
>> + * khugepaged only works for read-only fd, make sure all folios are
>> + * clean, since writes via mmap can happen between try_to_unmap() and
>> + * try_to_unmap_flush() via cached TLB entries.
>
>
> IIRC, after successful try_to_unmap() the PTE dirty bit would be synced to the
> folio. That's what you care about, not about any stale TLB entries.

Right. I missed the PTE dirty bit to folio dirtiness part.

>
> The important part is that the
>
> So can't we simply test for dirty folios after the refcount check (where
> we made sure the folio is no longer mapped)?
>

I think so.

>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b2ac28ddd480..920e16067134 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2089,6 +2089,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> goto out_unlock;
> }
>
> + /* ... */
> + if (!is_shmem && folio_test_dirty(folio)) {
> + result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> + xas_unlock_irq(&xas);
> + folio_putback_lru(folio);
> + goto out_unlock;
> + }
> +
> /*
> * Accumulate the folios that are being collapsed.
>
>
> I guess we don't have to recheck folio_test_writeback() ?

Right, writeback needs to take folio lock and we are holding it, so
others can only dirty the folio but are not able to initiate write backs.

Best Regards,
Yan, Zi