Re: WARN_ON in move_normal_pmd
From: Linus Torvalds
Date: Sun Mar 26 2023 - 18:48:30 EST
On Sat, Mar 25, 2023 at 7:27 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> So for that very reason, we still have to handle the bad case where the
> source PMD was not deleted right?
Well, so our rules are that if nothing is mapped in a particular page
table directory (any level), then it must be empty.
And that "must" is actually a hard requirement, because our exit path
won't even spend time tearing down page tables that don't have any
mappings in them.
So if you were to have non-empty pmd entries that don't have a vma
associated with them, there would be a memory leak, and we really
would want to warn about that case.
End result: it should be sufficient to do something like "if you don't
have a mapping below you within this PMD, you can expand the movement
down to a full PMD".
And same with the above case.
Of course, the more I think about this, the more I wonder "is this
even worth it". Because we have
(a) mremap() that can't trigger the problematic case currently
(because not overlapping), and *probably* almost never would trigger
the optimization of widening the move in practice.
(b) setup_arg_pages() will probably almost never triggers the
problematic case in practice, since you'd have to shift the pages by
*just* the right amount
so in the end, maybe the "real fix" is to just say "none of this
matters, let's just remove the warning".
An alternative "real fix" might even be to just say "just don't shift
the stack by exactly a PMD". It's unlikely to happen anyway, it's not
worth optimizing for, so just make sure it doesn't happen.
IOW, another alternative could be something like this:
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -783,7 +783,14 @@ int setup_arg_pages(struct linux_binprm *bprm,
unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))
return -ENOMEM;
+ /*
+ * Shift the stack up, but avoid shifting by
+ * exactly a PMD size, which causes issues
+ * when mixing page-sized and pmd-sized moves.
+ */
stack_shift = vma->vm_end - stack_top;
+ if (stack_shift && !(stack_shift & ~PMD_MASK))
+ stack_shift -= PAGE_SIZE;
bprm->p -= stack_shift;
mm->arg_start = bprm->p;
which is *really* quite ugly, and only handles the stack-grows-down
case, and I'm not proud of it, and the above is also entirely
untested.
I will delete that patch from my system after sending out this email,
and disavow any knowledge of that horrendously ugly hack. But if
somebody else takes ownership of it and I won't be blamed for it, I
would probably accept it as a solution.
Shudder. That's nasty.
Linus