Re: Folio discussion recap
From: Matthew Wilcox
Date: Sun Sep 19 2021 - 22:39:38 EST
On Fri, Sep 10, 2021 at 04:16:28PM -0400, Kent Overstreet wrote:
> Q: Oh yeah, but what again are folios for, exactly?
>
> Folios are for cached filesystem data which (importantly) may be mapped to
> userspace.
>
> So when MM people see a new data structure come up with new references to page
> size - there's a very good reason with that, which is that we need to be
> allocating in multiples of the hardware page size if we're going to be able to
> map it to userspace and have PTEs point to it.
>
> So going forward, if the MM people want struct page to refer to muliple hardware
> pages - this shouldn't prevent that, and folios will refer to multiples of the
> _hardware_ page size, not struct page pagesize.
>
> Also - all the filesystem code that's being converted tends to talk and thing in
> units of pages. So going forward, it would be a nice cleanup to get rid of as
> many of those references as possible and just talk in terms of bytes (e.g. I
> have generally been trying to get rid of references to PAGE_SIZE in bcachefs
> wherever reasonable, for other reasons) - those cleanups are probably for
> another patch series, and in the interests of getting this patch series merged
> with the fewest introduced bugs possible we probably want the current helpers.
I'd like to thank those who reached out off-list. Some of you know I've
had trouble with depression in the past, and I'd like to reassure you
that that's not a problem at the moment. I had a good holiday, and I
was able to keep from thinking about folios most of the time.
I'd also like to thank those who engaged in the discussion while I was
gone. A lot of good points have been made. I don't think the normal
style of replying to each email individually makes a lot of sense at
this point, so I'll make some general comments instead. I'll respond
to the process issues on the other thread.
I agree with the feeling a lot of people have expressed, that struct page
is massively overloaded and we would do much better with stronger typing.
I like it when the compiler catches bugs for me. Disentangling struct
page is something I've been working on for a while, and folios are a
step in that direction (in that they remove the two types of tail page
from the universe of possibilities).
I don't believe it is realistic to disentangle file pages and anon
pages from each other. Thanks to swap and shmem, both file pages and
anon pages need to be able to be moved in and out of the swap cache.
The swap cache shares a lot of code with the page cache, so changing
how the swap cache works is also tricky.
What I do believe is possible is something Kent hinted at; treating anon
pages more like file pages. I also believe that shmem should be able to
write pages to swap without moving the pages into the swap cache first.
But these two things are just beliefs. I haven't tried to verify them
and they may come to nothing.
I also want to split out slab_page and page_table_page from struct page.
I don't intend to convert either of those to folios.
I do want to make struct page dynamically allocated (and have for
a while). There are some complicating factors ...
There are two primary places where we need to map from a physical
address to a "memory descriptor". The one that most people care about
is get_user_pages(). We have a page table entry and need to increment
the refcount on the head page, possibly mark the head page dirty, but
also return the subpage of any compound page we find. The one that far
fewer people care about is memory-failure.c; we also need to find the
head page to determine what kind of memory has been affected, but we
need to mark the subpage as HWPoison.
Both of these need to be careful to not confuse tail and non-tail pages.
So yes, we need to use folios for anything that's mappable to userspace.
That's not just anon & file pages but also network pools, graphics card
memory and vmalloc memory. Eventually, I think struct page actually goes
down to a union of a few words of padding, along with ->compound_head.
Because that's all we're guaranteed is actually there; everything else
is only there in head pages.
There are a lot of places that should use folios which the current
patchset doesn't convert. I prioritised filesystems because we've got
~60 filesystems to convert, and working on the filesystems can proceed
in parallel with working on the rest of the MM. Also, if I converted
the entire MM at once, there would be complaints that a 600 patch series
was unreviewable. So here we are, there's a bunch of compatibility code
that indicates areas which still need to be converted.
I'm sure I've missed things, but I've been working on this email all
day and wanted to send it out before going to sleep.