Re: [RFC PATCH] mm: only set fault addrsss' access bit in do_anonymous_page

From: Dev Jain

Date: Tue Feb 10 2026 - 23:19:03 EST



On 11/02/26 6:19 am, Wenchao Hao wrote:
> On Tue, Feb 10, 2026 at 5:07 PM David Hildenbrand (Arm)
> <david@xxxxxxxxxx> wrote:
>> On 2/10/26 05:34, Wenchao Hao wrote:
>>> When do_anonymous_page() creates mappings for huge pages, it currently sets
>>> the access bit for all mapped PTEs (Page Table Entries) by default.
>>>
>>> This causes an issue where the Referenced field in /proc/pid/smaps cannot
>>> distinguish whether a page was actually accessed.
>> What is the use case that cares about that?
>>
> We have enabled 64KB large folios on Android devices, which may introduce
> some memory waste. I want to figure out the proportion of memory waste
> caused by large folios. Reading the "Referenced" field from /proc/pid/smaps
> is a relatively low-cost method.
>
> Additionally, considering future hot/cold page identification, we aim to
> detect 64KB large folios where some pages are actually unaccessed and split
> them into normal pages to avoid memory waste.
>
> However, the current large folio implementation sets the access bit for all
> page table entries (PTEs) of the large folio in the do_anonymous_page
> function, making it hard to distinguish whether pre-allocated pages were
> truly accessed.
>
>> What we have right now is the exact same behavior as if you would get a
>> PMD THP that has a single access+dirty bit at fault time.
>>
>> Also, architectures that support transparent PTE coalescing will not be
>> able to coalesce until all PTE bits are equal.
>>
>> This level of imprecision is to be expected with large folios that only
>> have a single access+dirty bit.
>>
> Thanks a lot for the response.
>
> I saw this description in the ARM manual, “D8.5.5 Use of the Contiguous bit
> with hardware updates to the translation tables”:
>
>
>> If hardware updates a translation table entry, and if the Contiguous bit in
>> that entry is 1, then the members in a group of contiguous translation table
>> entries can have different AF, AP[2], and S2AP[1] values.
> Does this mean that after hardware aggregates multiple PTEs, it can still
> independently set the AF and other flag bits corresponding to specific
> sub-PTE?

Yes. Hardware can update access and dirty bits per-pte. It is the job
of software to aggregate them.

>
> If so, can software also set different AF bits for a group of 16 PTEs
> without affecting the transparent PTE coalescing function?

Yes. See set_ptes -> __contpte_try_fold: look at pte_mkold(pte_mkclean()).
We ignore the a/d bits while constructing the next expected pte.

>
> The reason I have this confusion is that there is such a description in
> “D8.7.1 The Contiguous bit:”
>
>> Software is required to ensure that all of the adjacent translation table
>> entries for the contiguous region point to a contiguous OA range with
>> consistent attributes and permissions.
> It does not specify whether attributes and permissions include the AF bit.
>
>> --
>> Cheers,
>>
>> David