On Tue, Oct 05, 2021 at 02:52:01PM +0100, Matthew Wilcox wrote:
On Mon, Aug 23, 2021 at 05:26:41PM -0400, Johannes Weiner wrote:
One one hand, the ambition appears to substitute folio for everything
that could be a base page or a compound page even inside core MM
code. Since there are very few places in the MM code that expressly
deal with tail pages in the first place, this amounts to a conversion
of most MM code - including the LRU management, reclaim, rmap,
migrate, swap, page fault code etc. - away from "the page".
However, this far exceeds the goal of a better mm-fs interface. And
the value proposition of a full MM-internal conversion, including
e.g. the less exposed anon page handling, is much more nebulous. It's
been proposed to leave anon pages out, but IMO to keep that direction
maintainable, the folio would have to be translated to a page quite
early when entering MM code, rather than propagating it inward, in
order to avoid huge, massively overlapping page and folio APIs.
Here's an example where our current confusion between "any page"
and "head page" at least produces confusing behaviour, if not an
outright bug, isolate_migratepages_block():
page = pfn_to_page(low_pfn);
...
if (PageCompound(page) && !cc->alloc_contig) {
const unsigned int order = compound_order(page);
if (likely(order < MAX_ORDER))
low_pfn += (1UL << order) - 1;
goto isolate_fail;
}
compound_order() does not expect a tail page; it returns 0 unless it's
a head page. I think what we actually want to do here is:
if (!cc->alloc_contig) {
struct page *head = compound_head(page);
if (PageHead(head)) {
const unsigned int order = compound_order(head);
low_pfn |= (1UL << order) - 1;
goto isolate_fail;
}
}
Not earth-shattering; not even necessarily a bug. But it's an example
of the way the code reads is different from how the code is executed,
and that's potentially dangerous. Having a different type for tail
and not-tail pages prevents the muddy thinking that can lead to
tail pages being passed to compound_order().
Thanks for digging this up. I agree the second version is much better.
My question is still whether the extensive folio whitelisting of
everybody else is the best way to bring those codepaths to light.
The above isn't totally random. That code is a pfn walker which
translates from the basepage address space to an ambiguous struct page
object. There are more of those, but we can easily identify them: all
uses of pfn_to_page() and virt_to_page() indicate that the code needs
an audit for how exactly they're using the returned page.