Re: [PATCH 00/14] Small step toward KSM for file back page.

From: Matthew Wilcox
Date: Wed Oct 07 2020 - 13:06:04 EST


On Wed, Oct 07, 2020 at 10:48:35AM -0400, Jerome Glisse wrote:
> On Wed, Oct 07, 2020 at 04:20:13AM +0100, Matthew Wilcox wrote:
> > On Tue, Oct 06, 2020 at 09:05:49PM -0400, jglisse@xxxxxxxxxx wrote:
> > > The present patchset just add mapping argument to the various vfs call-
> > > backs. It does not make use of that new parameter to avoid regression.
> > > I am posting this whole things as small contain patchset as it is rather
> > > big and i would like to make progress step by step.
> >
> > Well, that's the problem. This patch set is gigantic and unreviewable.
> > And it has no benefits. The idea you present here was discussed at
> > LSFMM in Utah and I recall absolutely nobody being in favour of it.
> > You claim many wonderful features will be unlocked by this, but I think
> > they can all be achieved without doing any of this very disruptive work.
>
> You have any ideas on how to achieve them without such change ? I will
> be more than happy for a simpler solution but i fail to see how you can
> work around the need for a pointer inside struct page. Given struct
> page can not grow it means you need to be able to overload one of the
> existing field, at least i do not see any otherway.

The one I've spent the most time thinking about is sharing pages between
reflinked files. My approach is to pull DAX entries into the main page
cache and have them reference the PFN directly. It's not a struct page,
but we can find a struct page from it if we need it. The struct page
would belong to a mapping that isn't part of the file.

For other things (NUMA distribution), we can point to something which
isn't a struct page and can be distiguished from a real struct page by a
bit somewhere (I have ideas for at least three bits in struct page that
could be used for this). Then use a pointer in that data structure to
point to the real page. Or do NUMA distribution at the inode level.
Have a way to get from (inode, node) to an address_space which contains
just regular pages.

Using main memory to cache DAX could be done today without any data
structure changes. It just needs the DAX entries pulled up into the
main pagecache. See earlier item.

Exclusive write access ... you could put a magic value in the pagecache
for pages which are exclusively for someone else's use and handle those
specially. I don't entirely understand this use case.

I don't have time to work on all of these. If there's one that
particularly interests you, let's dive deep into it and figure out how
you can do it without committing this kind of violence to struct page.