Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
From: Linus Torvalds
Date: Wed Oct 09 2019 - 15:21:00 EST
On Wed, Oct 9, 2019 at 11:52 AM Thomas Hellstrom <thellstrom@xxxxxxxxxx> wrote:
>
> Hmm, so we have the following cases we need to handle when returning
> from the pmd_entry() handler.
No, we really don't.
> 1) Huge pmd was handled - Returns 0 and continues.
No.
That case simply DOES NOT EXIST.
The only case that exists is
"pmd was seen, we return 0 and then look at wherer pte level is relevant".
Note that this has nothing to do with huge or not.
> 2) A pmd is otherwise unstable, typically someone just zapped a huge
> pmd. Returns PAGE_WALK_FALLBACK, gets caught in the pmd_trans_unstable()
> test and retries.
No. PAGE_WALK_FALLBACK doesn't exist, is completely broken in your
patch, and is immaterial.
It falls under the previous heading: a pmd was seen, returns zero, and
we go on with life.
If you don't have a pte callback - like EVERY SINGLE CURRENT USER -
that "goes on with life" is just "go to the next pmd entry".
And if you do have a pte callback - like your new case, that "go on
with life" is to look at the pte cases.
> 3) A pte directory - Returns PAGE_WALK_FALLBACK, falls through, avoids
> the split and continues to the next level. Yeah that split avoidance
> test is indeed made unnecessary by the preceding pmd_trans_unstable() test.
Again, no. This case does not exist. It's the same case as above: it
returns 0 and goes on to the pte level.
> - split_huge_pmd(walk->vma, pmd, addr);
> + if (!ops->pmd_entry)
> + split_huge_pmd(walk->vma, pmd, addr);
>
> But as the commit message says, PAGE_WALK_FALLBACK is necessary to have
> a virtual address range being handled once and only once.
No. Your logic is garbage. The above code is completely broken.
YOU CAN NOT AVOID TRHE SPLIT AND THEN GO ON AT THE PTE LEVEL.
Don't you get it? There *is* no PTE level if you didn't split.
And your "being handled once and only once" is garbage too. If you ask
for both a pmd callback and a pte callback, you get both. It's that
simple.
There are zero users that actually do it now, and you don't want to do
it either, so all your arguments are just pointless.
> So we need the PAGE_WALK_FALLBACK.
No we don't. You make no sense. Your case doesn't want it, no existing
cases want it, nobody wants it.
When you actually have a case that wants it, let's look at it then.
Right now, you introduced fundamentally buggy code because your
thinking is fuzzy and broken.
So what you should do is to just always return 0 in your pmd_entry().
Boom, done. The only reason for the pmd_entry existing at all is to
get the warning. Then, if you don't want to split it, you make that
warning just return an error (or a positive value) instead and say
"ok, that was bad, we don't handle it at all".
And in some _future_ life, if anybody wants to actually say "yeah,
let's not split it", make it have some "yeah I handled it" case.
In fact, I would suggest that positive return values be exactly that
"I did it" case, and that they just add up instead of breaking out.
Only an actual error would break out, and 0 would then (continue to)
mean "continue with next level".
But right now, no such user even exists.
Linus