Re: [RFC PATCH v3 00/10] Add support for shared PTEs across processes

From: Kirill A. Shutemov
Date: Mon Oct 07 2024 - 05:01:58 EST


On Tue, Sep 03, 2024 at 04:22:31PM -0700, Anthony Yznaga wrote:
> This patch series implements a mechanism that allows userspace
> processes to opt into sharing PTEs. It adds a new in-memory
> filesystem - msharefs. A file created on msharefs represents a
> shared region where all processes mapping that region will map
> objects within it with shared PTEs. When the file is created,
> a new host mm struct is created to hold the shared page tables
> and vmas for objects later mapped into the shared region. This
> host mm struct is associated with the file and not with a task.

Taskless mm_struct can be problematic. Like, we don't have access to it's
counters because it is not represented in /proc. For instance, there's no
way to check its smaps.

Also, I *think* it is immune to oom-killer because oom-killer looks for a
victim task, not mm.
I hope it is not an intended feature :P

> When a process mmap's the shared region, a vm flag VM_SHARED_PT
> is added to the vma. On page fault the vma is checked for the
> presence of the VM_SHARED_PT flag.

I think it is wrong approach.

Instead of spaying VM_SHARED_PT checks across core-mm, we need to add a
generic hooks that can be used by mshare and hugetlb. And remove
is_vm_hugetlb_page() check from core-mm along the way.

BTW, is_vm_hugetlb_page() callsites seem to be the indicator to check if
mshare has to do something differently there. I feel you miss a lot of
such cases.

--
Kiryl Shutsemau / Kirill A. Shutemov