Re: Folio discussion recap

From: Dave Chinner
Date: Fri Sep 17 2021 - 21:11:11 EST

Next message: Kefeng Wang: "Re: [PATCH v2 0/2] riscv: improve unaligned memory accesses"
Previous message: Lai Jiangshan: "[PATCH V2 10/10] KVM: X86: Don't check unsync if the original spte is writible"
In reply to: Josef Bacik: "Re: Folio discussion recap"
Next in thread: Kent Overstreet: "Re: Folio discussion recap"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Sep 17, 2021 at 12:31:36PM -0400, Johannes Weiner wrote:
> My question for fs folks is simply this: as long as you can pass a
> folio to kmap and mmap and it knows what to do with it, is there any
> filesystem relevant requirement that the folio map to 1 or more
> literal "struct page", and that folio_page(), folio_nr_pages() etc be
> part of the public API?

In the short term, yes, we need those things in the public API.
In the long term, not so much.

We need something in the public API that tells us the offset and
size of the folio. Lots of page cache code currently does stuff like
calculate the size or iteration counts based on the difference of
page->index values (i.e. number of pages) and iterate page by page.
A direct conversion of such algorithms increments by
folio_nr_pages() instead of 1. So stuff like this is definitely
necessary as public APIs in the initial conversion.

Let's face it, folio_nr_pages() is a huge improvement on directly
exposing THP/compound page interfaces to filesystems and leaving
them to work it out for themselves. So even in the short term, these
API members represent a major step forward in mm API cleanliness.

As for long term, everything in the page cache API needs to
transition to byte offsets and byte counts instead of units of
PAGE_SIZE and page->index. That's a more complex transition, but
AFAIA that's part of the future work Willy is intended to do with
folios and the folio API. Once we get away from accounting and
tracking everything as units of struct page, all the public facing
APIs that use those units can go away.

It's fairly slow to do this, because we have so much code that is
doing stuff like converting file offsets between byte counts and
page counts and vice versa. And it's not necessary to do an initial
conversion to folios, either. But once everything in the page cache
indexing API moves to byte ranges, the need to count pages, use page
counts are ranges, iterate by page index, etc all goes away and
hence those APIs can also go away.

As for converting between folios and pages, we'll need those sorts
of APIs for the foreseeable future because low level storage layers
and hardware use pages for their scatter gather arrays and at some
point we've got to expose those pages from behind the folio API.
Even if we replace struct page with some other hardware page
descriptor, we're still going to need such translation APIs are some
point in the stack....

> Or can we keep this translation layer private
> to MM code? And will page_folio() be required for anything beyond the
> transitional period away from pages?

No idea, but as per above I think it's a largely irrelevant concern
for the forseeable future because pages will be here for a long time
yet.

> Can we move things not used outside of MM into mm/internal.h, mark the
> transitional bits of the public API as such, and move on?

Sure, but that's up to you to do as a patch set on top of Willy's
folio trees if you think it improves the status quo. Write the
patches and present them for review just like everyone else does,
and they can be discussed on their merits in that context rather
than being presented as a reason for blocking current progress on
folios.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

Next message: Kefeng Wang: "Re: [PATCH v2 0/2] riscv: improve unaligned memory accesses"
Previous message: Lai Jiangshan: "[PATCH V2 10/10] KVM: X86: Don't check unsync if the original spte is writible"
In reply to: Josef Bacik: "Re: Folio discussion recap"
Next in thread: Kent Overstreet: "Re: Folio discussion recap"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]