Re: [PATCH] mremap: enforce rmap src/dst vma ordering in case ofvma_merge succeeding in copy_vma

From: Andrea Arcangeli
Date: Fri Nov 04 2011 - 21:34:06 EST


On Sat, Nov 05, 2011 at 08:21:03AM +0800, Nai Xia wrote:
> copy_vma() ---> rmap_walk() scan dst VMA --> move_page_tables() moves src to dst
> ---> rmap_walk() scan src VMA. :D

Hmm yes. I think I got in the wrong track because I focused too much
on that line you started talking about, the *vmap = new_vma, you said
I had to reorder stuff there too, and that didn't make sense.

The reason it doesn't make sense is that it can't be ok to reorder
stuff when *vmap = new_vma (i.e. new_vma = old_vma). So if I didn't
need to reorder in that case I thought I could extrapolate it was
always ok.

But the opposite is true: that case can't be solved.

Can it really happen that vma_merge will pack (prev_vma, new_range,
old_vma) together in a single vma? (i.e. prev_vma extended to
old_vma->vm_end)

Even if there's no prev_vma in the picture (but that's the extreme
case) it cannot be safe: i.e. a (new_range, old_vma) or (old_vma,
new_range).

1 single "vma" for src and dst virtual ranges, means 1 single
vma->vm_pgoff. But we've two virtual addresses and two ptes. So the
same page->index can't work for both if the vma->vm_pgoff is the
same.

So regardless of the ordering here we're dealing with something more
fundamental.

If rmap_walk runs immediately after vma_merge completes and releases
the anon_vma_lock, it won't find any pte in the vma anymore. No matter
the order.

I thought at this before and I didn't mention it but at the light of
the above issue I start to think this is the only possible correct
solution to the problem. We should just never call vma_merge before
move_page_tables. And do the merge by hand later after mremap is
complete.

The only safe way to do it is to have _two_ different vmas, with two
different ->vm_pgoff. Then it will work. And by always creating a new
vma we'll always have it queued at the end, and it'll be safe for the
same reasons fork is safe.

Always allocate a new vma, and then after the whole vma copy is
complete, look if we can merge and free some vma. After the fact, so
it means we can't use vma_merge anymore. vma_merge assumes the
new_range is "virtual" and no vma is mapped there I think. Anyway
that's an implementation issue. In some unlikely case we'll allocate 1
more vma than before, and we'll free it once mremap is finished, but
that's small problem compared to solving this once and for all.

And that will fix it without ordering games and it'll fix the *vmap=
new_vma case too. That case really tripped on me as I was assuming
*that* was correct.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/