Re: [PATCH] mm: fix possible cause of a page_mapped BUG

From: Hugh Dickins
Date: Thu Apr 07 2011 - 10:17:35 EST

On Wed, 6 Apr 2011, Linus Torvalds wrote:
> On Wed, Apr 6, 2011 at 8:43 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> >
> > I was about to send you my own UNTESTED patch: let me append it anyway,
> > I think it is more correct than yours (it's the offset of vm_end we need
> > to worry about, and there's the funny old_len,new_len stuff).
> Umm. That's what my patch did too. The
> pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
> is the "offset of the pgoff" from the original mapping, then we do
> pgoff += vma->vm_pgoff;
> to get the pgoff of the new mapping, and then we do
> if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
> to check that the new mapping is ok.

Right, I was forgetting the semantics for mremap when
addr + old_len < vma->vm_end. It has to move out the
old section and extend it elsewhere, it does not affect
the page just before vma->vm_end at all. So mine was
indeed a more complicated way of doing yours.

> I think yours is equivalent, just a different (and odd - that
> linear_page_index() thing will do lots of unnecessary shifts and
> hugepage crap) way of writing it.

I was trying to use the common function provided: but it's
actually wrong, that's a function for getting the value found
in page->index (in units of PAGE_CACHE_SIZE), whereas here we
want the value found in vm_pgoff (in units of PAGE_SIZE).

Of course PAGE_CACHE_SIZE has equalled PAGE_SIZE everywhere but in
some patches by Christoph Lameter a few years back, so there isn't
an effective difference; but I was wrong to use that function.

> > See what you think - sorry, I'm going out now.
> I think _yours_ is conceptually buggy, because I think that test for
> "vma->vm_file" is wrong.

Just being cautious: we cannot hit the BUG in prio_tree.c when we're
dealing with an anonymous mapping, and I didn't want to think about
anonymous at the time.

> Yes, new anonymous mappings set vm_pgoff to the virtual address, but
> that's not true for mremap() moving them around, afaik.
> Admittedly it's really hard to get to the overflow case, because the
> address is shifted down, so even if you start out with an anonymous
> mmap at a high address (to get a big vm_off), and then move it down
> and expand it (to get a big size), I doubt you can possibly overflow.
> But I still don't think that the test for vm_file is semantically
> sensible, even if it might not _matter_.

The strangest case is when a 64-bit kernel execs a 32-bit executable,
preparing the stack with a very high virtual address which goes into
vm_pgoff (shifted by PAGE_SHIFT), then moves that stack down into the
32-bit address space but leaving it with the original high vm_pgoff.

I think you are now excluding some wild anonymous cases which were
allowed before, and gave no trouble - vma_address() looks like a wrap
won't upset it. But they're not cases which anyone is likely to do,
and safer to keep the anon rules in synch with the file rules.

> But whatever. I suspect both our patches are practically doing the
> same thing, and it would be interesting to hear if it actually fixes
> the issue. Maybe there is some other way to mess up vm_pgoff that I
> can't think of right now.

Here's yours inline below:

Acked-by: Hugh Dickins <hughd@xxxxxxxxxx>

mm/mremap.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 1de98d492ddc..a7c1f9f9b941 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -277,9 +277,16 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
if (old_len > vma->vm_end - addr)
goto Efault;

- if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)) {
- if (new_len > old_len)
+ /* Need to be careful about a growing mapping */
+ if (new_len > old_len) {
+ unsigned long pgoff;
+ if (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))
goto Efault;
+ pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
+ pgoff += vma->vm_pgoff;
+ if (pgoff + (new_len >> PAGE_SHIFT) < pgoff)
+ goto Einval;

if (vma->vm_flags & VM_LOCKED) {