Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes

From: Anthony Yznaga
Date: Mon Oct 07 2024 - 15:24:57 EST



On 10/7/24 2:01 AM, Kirill A. Shutemov wrote:
On Tue, Sep 03, 2024 at 04:22:31PM -0700, Anthony Yznaga wrote:
This patch series implements a mechanism that allows userspace
processes to opt into sharing PTEs. It adds a new in-memory
filesystem - msharefs. A file created on msharefs represents a
shared region where all processes mapping that region will map
objects within it with shared PTEs. When the file is created,
a new host mm struct is created to hold the shared page tables
and vmas for objects later mapped into the shared region. This
host mm struct is associated with the file and not with a task.
Taskless mm_struct can be problematic. Like, we don't have access to it's
counters because it is not represented in /proc. For instance, there's no
way to check its smaps.

Definitely needs exposure in /proc. One of the things I'm looking into is the feasibility of showing the mappings in maps/smaps/etc..



Also, I *think* it is immune to oom-killer because oom-killer looks for a
victim task, not mm.
I hope it is not an intended feature :P

oom-killer would have to kill all sharers of an mshare region before the mshare region itself could be freed, but I'm not sure that oom-killer would be the one to free the region. An mshare region is essentially a shared memory object not unlike a tmpfs or hugetlb file. I think some higher level intelligence would have to be involved to release it if appropriate when under oom conditions.



When a process mmap's the shared region, a vm flag VM_SHARED_PT
is added to the vma. On page fault the vma is checked for the
presence of the VM_SHARED_PT flag.
I think it is wrong approach.

Instead of spaying VM_SHARED_PT checks across core-mm, we need to add a
generic hooks that can be used by mshare and hugetlb. And remove
is_vm_hugetlb_page() check from core-mm along the way.

BTW, is_vm_hugetlb_page() callsites seem to be the indicator to check if
mshare has to do something differently there. I feel you miss a lot of
such cases.

Good point about is_vm_hugetlb_page(). I'll review the callsites (there are only ~60 of them :-).


Thanks,

Anthony