Re: [PATCH 1/2] mm: Take all anon_vma locks in anon_vma_lock

From: Peter Zijlstra
Date: Mon May 03 2010 - 12:55:33 EST


On Mon, 2010-05-03 at 12:18 -0400, Rik van Riel wrote:
> From: Rik van Riel <riel@xxxxxxxxxx>
>
> Both the page migration code and the transparent hugepage patches expect
> 100% reliable rmap lookups and use page_lock_anon_vma(page) to prevent
> races with mmap, munmap, expand_stack, etc.
>
> Specifically, try_to_unmap indirectly calls vma_address, which uses the
> difference between vma->vm_start and vma->vm_pgoff, which can race when a
> stack is expanded downwards. VMA splitting and merging present similar
> issues.
>
> With the new anon_vma code, one VMA can be attached to multiple anon_vmas,
> however mmap, munmap, expand_stack and others only took one anon_vma->lock.
> This patch changes things so we take the anon_vma locks for all of the
> anon_vmas attached to a VMA in the code that try_to_unmap would otherwise
> race against: mmap, munmap, expand_stack, etc.
>
> Unfortunately, this leads to a lock ordering conflict with the page_table_lock,
> which protected the "same_vma" list in the anon_vma_chain. Replacing that
> lock with a new lock (mm->anon_vma_chain_lock), which is taken higher up in
> the mm locking hierarchy, solves that issue. This changes the locking rules
> for the "same_vma" list to be either mm->mmap_sem for write, or mm->mmap_sem
> for read plus the new mm->anon_vma_chain lock. This limits the place where
> the new lock is taken to 2 locations - anon_vma_prepare and expand_downwards.
>
> Document the locking rules for the same_vma list in the anon_vma_chain and
> remove the anon_vma_lock call from expand_upwards, which does not need it.

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 456ec6f..81850fc 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c

> @@ -1705,12 +1707,11 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> return -EFAULT;
>
> /*
> - * We must make sure the anon_vma is allocated
> - * so that the anon_vma locking is not a noop.
> + * Unlike expand_downwards, we do not need to take the anon_vma lock,
> + * because we leave vma->vm_start and vma->pgoff untouched.
> + * This means rmap lookups of pages inside this VMA stay valid
> + * throughout the stack expansion.
> */
> - if (unlikely(anon_vma_prepare(vma)))
> - return -ENOMEM;
> - anon_vma_lock(vma);
>
> /*
> * vma->vm_start/vm_end cannot change under us because the caller
> @@ -1721,7 +1722,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> if (address < PAGE_ALIGN(address+4))
> address = PAGE_ALIGN(address+4);
> else {
> - anon_vma_unlock(vma);
> return -ENOMEM;
> }
> error = 0;
> @@ -1737,7 +1737,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> if (!error)
> vma->vm_end = address;
> }
> - anon_vma_unlock(vma);
> return error;
> }
> #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */


This does leave me worrying about concurrent faults poking at
vma->vm_end without synchronization.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/