Re: [RFC][PATCH] netfs, afs, ceph: Use folios

From: Matthew Wilcox
Date: Wed Aug 11 2021 - 09:56:21 EST


On Wed, Aug 11, 2021 at 02:07:51PM +0100, David Howells wrote:
> Convert the netfs helper library and the afs filesystem to use folios.
>
> NOTE: This patch will also need to alter the ceph filesystem, but as that's
> not been done that yet, ceph will fail to build.
>
> The patch makes two alterations to the mm headers:
>
> (1) Fix a bug in readahead_folio() where a NULL return from
> __readahead_folio() will cause folio_put() to oops.

I'll fold that in.

> (2) Add folio_change_private() to change the private data on the folio
> without adjusting the page refcount or changing the flag. This
> assumes folio_attach_private() was already called.

Makes sense.

> (*) Should I be using page_mapping() or page_file_mapping()?

Depends if you can have a swapfile on your filesystem. I'd like to
get rid of this and only use the directIO path for swap, but that's a
far-distant project.

> (*) Can page_endio() be split into two separate functions, one for read
> and one for write? If seems a waste of time to conditionally switch
> between two different branches.

So you'd like a folio_end_write() and folio_end_read()?

> (*) Is there a better way to implement afs_kill_pages() and
> afs_redirty_pages()? I was previously using find_get_pages_contig()
> into a pagevec, but that doesn't look like it'll work with folios, so
> I'm now calling filemap_get_folio() a lot more - not that it matters
> so much, as these are failure paths.

I always disliked the _contig variants. Block filesystems tend to
follow the pattern

for-each-page-in-range
if page-is-contig-with-prev
append-to-bio
else
start-new-bio

while network filesystems tend to use the pattern

for-range
get-a-batch-of-contig-pages
submit-an-io-using-these-pages

it'd be nice to follow the same pattern for both. Would reduce the
amount of duplicated infrastructure.

> Also, should these be moved into generic code?

I'd have to figure out what they do to answer this question.

> (*) Can ->page_mkwrite() see which subpage of a folio got hit?

It already does -- you're passed a page, not a folio. Are you trying
to optimise by only marking part of a folio as dirty? If so, that's a
bad idea because we're going to want to, eg, map 64KB chunks of a folio
with a single TLB entry on ARM, so you'll only get one notification for
that page.

> (*) __filemap_get_folio() should be used instead of
> grab_cache_page_write_begin()? What should be done if xa_is_value()
> returns true on the value returned by that?

If you don't pass FGP_ENTRY, it won't return you an xa_is_value() ...