Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range

From: David Hildenbrand (Arm)

Date: Thu Apr 09 2026 - 05:42:02 EST


On 4/9/26 11:37, David Hildenbrand (Arm) wrote:
> On 4/9/26 11:18, Lorenzo Stoakes wrote:
>> On Wed, Apr 08, 2026 at 02:57:10PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> I'm wondering whether we could figure the pgoff out, somehow, so we
>>> wouldn't have to store it elsewhere.
>>>
>>> What we need is essentially what __folio_set_anon() would have done for
>>> the original folio we replaced.
>>>
>>> folio->index = linear_page_index(vma, address);
>>>
>>> Could we obtain that from the anon_vma assigned to our rmap_item?
>>>
>>> pgoff_t pgoff;
>>>
>>> pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT;
>>> pgoff += anon_vma->vma->vm_pgoff;
>>
>> anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all
>> 'related' VMAs.
>
> Right, anon_vma_chain has. Dammit.
>
>>
>> And we're already looking at what might be covered by the anon_vma by
>> invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0,
>> ULONG_MAX).
>>
>>>
>>> It would be the same adjustment everywhere we look in child processes,
>>> because the moment they would mremap() would be where we would have
>>> unshared.
>>>
>>> Just a thought after reading avc_start_pgoff ...
>>
>> One interesting thing here is in the anon_vma_interval_tree_foreach() loop
>> we check:
>>
>> if (addr < vma->vm_start || addr >= vma->vm_end)
>> continue;
>>
>> Which is the same as saying 'hey we are ignoring remaps'.
>>
>> But... if _we_ got remapped previously (the unsharing is only temporary),
>> then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT,
>> and would still not be able to figure out the correct pgoff after sharing.
>>
>> I wonder if we could just store the pgoff in the rmap_item though?
>
> That's what I said elsewhere and what I was trying to avoid here.
>
> It's 64bytes, and adding a new item will increase it to 96 bytes IIUC.

As we're using a dedicate kmem cache it might "only" add 8 bytes, not
sure. Still an undesired increase given that we need that for each entry
in the stable/unstable tree.

--
Cheers,

David