Re: [PATCH v1 0/5] mm, kpageflags: support folio and fix output for compound pages
From: Matthew Wilcox
Date: Fri Oct 13 2023 - 11:04:04 EST
On Thu, Oct 12, 2023 at 05:30:34PM +0200, David Hildenbrand wrote:
> On 12.10.23 17:02, Naoya Horiguchi wrote:
> > On Thu, Oct 12, 2023 at 10:33:04AM +0200, David Hildenbrand wrote:
> > > On 10.10.23 16:27, Naoya Horiguchi wrote:
> > > > Hi everyone,
> > > >
> > > > This patchset addresses 2 issues in /proc/kpageflags.
> > > >
> > > > 1. We can't easily tell folio from thp, because currently both pages are
> > > > judged as thp, and
> > > > 2. we see some garbage data in records of compound tail pages because
> > > > we use tail pages to store some internal data.
> > > >
> > > > These issues require userspace programs to do additional work to understand
> > > > the page status, which makes situation more complicated.
> > > >
> > > > This patchset tries to solve these by defining KPF_FOLIO for issue 1., and
> > > > by hiding part of page flag info on tail pages of compound pages for issue 2.
> > > >
> > > > I think that technically some compound pages like thp/hugetlb/slab could be
> > > > considered as folio, but in this version KPF_FOLIO is set only on folios
> > >
> > > At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we
> > > no longer call them folios (cannot be mapped to user space). But Im not sure
> > > about the type hierarchy.
> >
> > I'm not sure about the exact definition of "folio", and I think it's better
> > to make KPF_FOLIO set based on the definition.
>
> Me neither. But in any case a THP *is* a folio. So you'd have to set that
> flag in any case.
>
> And any order-0 page (i.e., anon, pagecache) is also a folio. What you seem
> to imply with folio is "large folio". So KPF_FOLIO is really wrong as far as
> I can tell.
Our type hierarchy is degenerate ... in both the neutral and negative
sense of the word. A folio is simply not-a-tail-page. So, as you said,
all head pages and all order-0 pages are folios.
But we're still struggling against the legacy of our "struct page is
everything" mistake, and trying to fix that too. The general term I've
chosen for this is "memdesc", but we aren't very far down the route of
disentangling the various types from either page or folio. I'd imagined
that we'd convert everything to folio, then get into splitting them out,
but at least for ptdesc and slab we've gone for the direct conversion
approach.
At some point we probably want to disentangle anon folios from file
folios, but that's a fair ways down the list, after turning folios into a
separate allocation from struct page. At least on my list ... if someone
wants to do that as a matter of urgency, I'm sure they can be accomodated.
It's not an easy task, for sure. Our needs are better expressed as
(in Java terms) Interfaces rather than subclasses. Or Traits/Generics
if you've started learning Rust.
We definitely have the concept of "mappable to userspace" which applies
to anon, file, netmem, some device driver allocations, some vmalloc
allocations, but not slab, page tables, or free memory. Those memdescs
need refcount, mapcount, dirty flag, lock flag, maybe mapping?
Then we have "managed by the LRU" which applies to anon & file only.
Those memdescs need refcount, lru, and a pile of flags.
There's definitely scope for reordering and shrinking the various
memdescs. Once they're fully separated from struct page. What we _call_
them is a separate struggle. Try to imagine how shrink_folio_list()
works if filemem & anonmem have different types ...
> > > It does sound inconsistent. What exactly do you want to tell user space with
> > > the new flag?
> >
> > The current most problematic behavior is to report folio as thp (order-2
> > pagecache page is definitely a folio but not a thp), and this is what the
> > new flag is intended to tell.
>
> We are currently considering calling these sub-PMD sized THPs "small-sized
> THP". [1] Arguably, we're starting with the anon part where we won't get
> around exposing them to the user in sysfs.
>
> So I wouldn't immediately say that these things are not THPs. They are not
> PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby
> slabs can also be order-0 folios, but hugetlb can't.
I think this is a mistake. Users expect THPs to be PMD sized. We already
have the term "large folio" in use for file-backed memory; why do we
need to invent a new term for anon large folios?
> Looking at other interfaces, we do expose:
>
> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15
> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16
>
> So maybe we should just continue talking about compound pages or do we have
> to use both terms here in this interface?
I don;t know how easy it's going to be to distinguish between a head
and tail page in the Glorious Future once pages and folios are separated.