On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote:
Assume we do do the page table sharing at mmap time, if the flags are right.That may be the most common in your usage, but for a database, you're
Let's focus on the most common:
mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)
And doing the same in each and every process.
looking at two usage scenarios. Postgres calls mmap() on the database
file itself so that all processes share the kernel page cache.
Some Commercial Databases call mmap() on a hugetlbfs file so that all
processes share the same userspace buffer cache. Other Commecial
Databases call shmget() / shmat() with SHM_HUGETLB for the exact
same reason.
This is why I proposed mshare(). Anyone can use it for anything.
We have such a diverse set of users who want to do stuff with shared
page tables that we should not be tying it to memfd or any other
filesystem. Not to mention that it's more flexible; you can map
individual 4kB files into it and still get page table sharing.