On Mon, Apr 11, 2022 at 10:05:44AM -0600, Khalid Aziz wrote:
Page tables in kernel consume some of the memory and as long as number
of mappings being maintained is small enough, this space consumed by
page tables is not objectionable. When very few memory pages are
shared between processes, the number of page table entries (PTEs) to
maintain is mostly constrained by the number of pages of memory on the
system. As the number of shared pages and the number of times pages
are shared goes up, amount of memory consumed by page tables starts to
become significant.
All of this is true. However, I've found a lot of people don't see this
as compelling. I've had more success explaining this from a different
direction:
--- 8< ---
Linux supports processes which share all of their address space (threads)
and processes that share none of their address space (tasks). We propose
a useful intermediate model where two or more cooperating processes
can choose to share portions of their address space with each other.
The shared portion is referred to by a file descriptor which processes
can choose to attach to their own address space.
Modifications to the shared region affect all processes sharing
that region, just as changes by one thread affect all threads in a
multithreaded program. This implies a certain level of trust between
the different processes (ie malicious processes should not be allowed
access to the mshared region).
--- 8< ---
Another argument that MM developers find compelling is that we can reduce
some of the complexity in hugetlbfs where it has the ability to share
page tables between processes.
One objection that was raised is that the mechanism for starting the
shared region is a bit clunky. Did you investigate the proposed approach
of creating an empty address space, attaching to it and using an fd-based
mmap to modify its contents?
int mshare_unlink(char *name)
A shared address range created by mshare() can be destroyed using
mshare_unlink() which removes the shared named object. Once all
processes have unmapped the shared object, the shared address range
references are de-allocated and destroyed.
mshare_unlink() returns 0 on success or -1 on error.
Can you explain why this is a syscall instead of being a library
function which does
int dirfd = open("/sys/fs/mshare");
err = unlinkat(dirfd, name, 0);
close(dirfd);
return err;
Does msharefs support creating directories, so that we can use file
permissions to limit who can see the sharable files? Or is it strictly
a single-level-deep hierarchy?