Re: [PATCH 2/3] mm,migration: Prevent rmap_walk_[anon|ksm] seeingthe wrong VMA information

From: Mel Gorman
Date: Wed Apr 28 2010 - 05:16:37 EST


On Wed, Apr 28, 2010 at 01:10:07AM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 27, 2010 at 10:30:51PM +0100, Mel Gorman wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f90ea92..61d6f1d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -578,6 +578,9 @@ again: remove_next = 1 + (end > next->vm_end);
> > }
> > }
> >
> > + if (vma->anon_vma)
> > + spin_lock(&vma->anon_vma->lock);
> > +
> > if (root) {
> > flush_dcache_mmap_lock(mapping);
> > vma_prio_tree_remove(vma, root);
> > @@ -620,6 +623,9 @@ again: remove_next = 1 + (end > next->vm_end);
> > if (mapping)
> > spin_unlock(&mapping->i_mmap_lock);
> >
> > + if (vma->anon_vma)
> > + spin_unlock(&vma->anon_vma->lock);
> > +
> > if (remove_next) {
> > if (file) {
> > fput(file);
>
> The old code did:
>
> /*
> * When changing only vma->vm_end, we don't really need
> * anon_vma lock.
> */
> if (vma->anon_vma && (insert || importer || start != vma->vm_start))
> anon_vma = vma->anon_vma;
> if (anon_vma) {
> spin_lock(&anon_vma->lock);
>
> why did it become unconditional? (and no idea why it was removed)
>

It became unconditional because I wasn't sure of the optimisation versus the
new anon_vma changes (doesn't matter, should have been safe). At the time
the patch was introduced, the bug looked like a race in VMA's in the list
having their details modified. I thought vma_address was returning -EFAULT
when it shouldn't and while this may still be possible, it wasn't the prime
cause of the bug.

The more important race was in execve between when a VMA got moved and the
page tables copied. The anon_vma locks are fine for the VMA move but the
page table copy happens later. What the patch did was alter the timing of
the race. rmap_walk() was finding the VMA of the new stack being set up by
exec, failing to lock it and backing off. By the time it would restart and
get back to that VMA, it was already moved making the bug simply harder to
reproduce because the race window was so small.

So, the VMA list does not appear to be messed up but there still needs
to be protection against modification of VMA details that are already on
the list. For that, the seq counter would have been enough and
lighter-weight than acquiring the anon_vma->lock every time in
vma_adjust().

I'll drop this patch again as the execve race looks the most important.

> But I'm not sure about this part.... this is really only a question, I
> may well be wrong, I just don't get it.
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/