Re: [GIT PULL] Memory folios for v5.15

From: David Howells
Date: Tue Aug 24 2021 - 11:54:38 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Yeah, honestly, I would have preferred to see this done the exact
> reverse way: make the rule be that "struct page" is always a head
> page, and anything that isn't a head page would be called something
> else.
> ...
> That said, I see why Willy did it the way he did - it was easier to do
> it incrementally the way he did. But I do think it ends up with an end
> result that is kind of topsy turvy where the common "this is the core
> allocation" being called that odd "folio" thing, and then the simpler
> "page" name is for things that almost nobody should even care about.

>From a filesystem pov, it may be better done Willy's way. There's a lot of
assumption that "struct page" corresponds to a PAGE_SIZE patch of RAM and is
equivalent to a hardware page, so using something other than struct page seems
a better idea. It's easier to avoid the assumption if it's called something
different.

We're dealing with variable-sized clusters of things that, in the future,
could be, say, a combination of typical 4K pages and higher order pages
(depending on what the arch supports), so I think using "page" is the wrong
name to use.

There are some pieces, kmap being a prime example, that might be tricky to
make handle a transparently variable-sized multipage object, so careful
auditing will likely be required if we do stick with "struct page".

Further, there's the problem that there are a *lot* of places where
filesystems access struct page members directly, rather than going through
helper functions - and all of these need to be fixed. This is much easier to
manage if we can get the compiler to do the catching. Hiding them all within
struct page would require a humongous single patch.

One question does spring to mind, though: do filesystems even need to know
about hardware pages at all? They need to be able to access source data or a
destination buffer, but that can be stitched together from disparate chunks
that have nothing to do with pages (eg. iov_iter); they need access to the
pagecache, and may need somewhere to cache pieces of information, and they
need to be able to pass chunks of pagecache, data or bufferage to crypto
(scatterlists) and I/O routines (bio, skbuff) - but can we hide "paginess"
from filesystems?

The main point where this matters, at the moment, is, I think, mmap - but
could more of that be handled transparently by the VM?

> Because, as you say, head pages are the norm. And "folio" may be a
> clever term, but it's not very natural. Certainly not at all as
> intuitive or common as "page" as a name in the industry.

That's mostly because no one uses the term... yet, and that it's not commonly
used. I've got used to it in building on top of Willy's patches and have no
problem with it - apart from the fact that I would expect something more like
a plural or a collective noun ("sheaf" or "ream" maybe?) - but at least the
name is similar in length to "page".

And it's handy for grepping ;-)

> I'd have personally preferred to call the head page just a "page", and
> other pages "subpage" or something like that. I think that would be
> much more intuitive than "folio/page".

As previously stated, I think we need to leave "struct page" as meaning
"hardware page" and build some other concept on top for aggregation/buffering.

David