Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
From: David Hildenbrand (Arm)
Date: Fri Apr 17 2026 - 07:49:22 EST
On 4/16/26 22:25, Kiryl Shutsemau wrote:
> On Thu, Apr 16, 2026 at 08:32:19PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/16/26 15:49, Kiryl Shutsemau wrote:
>>>
>>> Here is an updated version:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/log/?h=uffd/rfc-v2
>>>
>>> will post after -rc1 is tagged.
>>>
>>> I like it more. It got substantially cleaner.
>>
>> I don't have time to look into the details just yet, but my thinking was
>> that
>>
>> a) It would avoid the zap+refault
>
> Yep.
>
>> b) We could reuse the uffd-wp PTE bit + marker to indicate/remember the
>> protection, making it co-exist with NUMA hinting naturally.
>>
>> b) obviously means that we cannot use uffd-wp and uffd-rwp at the same
>> time in the same uffd area. I guess that should be acceptable for the
>> use cases we you should have in mind?
>
> I took a different path: I still use PROT_NONE PTEs, so it cannot
> co-exist with NUMA balancing [fully], but WP + RWP should be fine. I
> need to add a test for this.
>
> I didn't give up on NUMA balancing completely. task_numa_fault() is
> called on RWP fault. So it should help scheduler decisions somewhat.
>
> I think an RWP user might want to use WP too.
>
> Do you see this trade-off as reasonable?
One reason why the PTE bit was added for the WP case was to distinguish
it from other write faults.
I assume without a dedicated PTE bit your design will always suffer from
false positive notifications.
Leaving NUMA-balancing aside, a simple
mprotect(PROT_NONE)+mprotect(PROT_READ) would already be problematic to
distinguish both cases. Zap+refault for shmem would likely have similar
problems (we'd need a marker).
I don't think a design that allows for false positives is what we really
want, especially as it would diverge from what we already have for WP.
Yes, using the PTE bit (that we already have) implies that we could, for
now, not allow the combination of WP + RWP.
--
Cheers,
David