Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present

From: Linus Torvalds
Date: Wed Oct 09 2019 - 13:17:13 EST

On Wed, Oct 9, 2019 at 10:03 AM Thomas HellstrÃm (VMware)
<thomas_os@xxxxxxxxxxxx> wrote:
> Nope, it handles the hugepages by ignoring them, since they should be
> read-only, but if pmd_entry() was called with something else than a
> hugepage, then it requests the fallback, but never a split.

But PAGE_WALK_FALLBACK _is_ a split.

Oh, except you did this

- split_huge_pmd(walk->vma, pmd, addr);
+ if (!ops->pmd_entry)
+ split_huge_pmd(walk->vma, pmd, addr);

so it avoids the split.

No, that's unacceptable. And makes no sense anyway. If it doesn't
split the pmd, then it shouldn't walk the pte's - because they don't
exist. And if it's not a hugepmd, then the split is a no-op, so the
test makes no sense.

I hadn't noticed that part of the patch. That simply can't be right. I
don't think you've tested this, because you never actually have
hugepages, do you?

You didn't notice or realize that split_huge_pmd() just does that

if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \
|| pmd_devmap(*____pmd)) \

thing and doesn't do anythign at all if it's not huge.

So no. That code makes no sense at all, and I didn't realize how
senseless it was, becasue I stupidly missed that "make the split
conditional" - which is insane and wrong - and I thought that you
wanted PAGE_WALK_FALLBACK to split a pmd and fall back to per-pte
entries, which is what the name implies.

But that's not what you wanted at all.

Just get rid of PAGE_WALK_FALLBACK entirely then, and make the rule be
that a zero return value just means "split and do ptes". Which is what
you want (see above why "split" simply is wrong, and isn't an issue
for you anyway.

That won't change any existing cases, since even if they do have a
zero return value, they don't have a pte_entry() callback, so they
won't do that "split and do ptes" anyway.