Re: [PATCH 1/2] mm,migration: Prevent rmap_walk_[anon|ksm] seeingthe wrong VMA information

From: Mel Gorman
Date: Fri May 07 2010 - 12:27:12 EST


On Fri, May 07, 2010 at 09:56:54AM +0900, KAMEZAWA Hiroyuki wrote:
> On Fri, 7 May 2010 00:20:52 +0100
> Mel Gorman <mel@xxxxxxxxx> wrote:
>
> > vma_adjust() is updating anon VMA information without locks being taken.
> > In contrast, file-backed mappings use the i_mmap_lock and this lack of
> > locking can result in races with users of rmap_walk such as page migration.
> > vma_address() can return -EFAULT for an address that will soon be valid.
> > For migration, this potentially leaves a dangling migration PTE behind
> > which can later cause a BUG_ON to trigger when the page is faulted in.
> >
> > <SNIP>
>
> I'm sorry but I don't think I understand this. Could you help me ?
>

Hopefully.

> IIUC, anon_vma_chain is linked as 2D-mesh
>
> anon_vma1 anon_vma2 anon_vma3
> | | |
> vma1 ----- 1 -------- 2 --------- 3 -----
> | | |
> vma2 ----- 4 -------- 5 ---------- 6 -----
> | | |
> vma3 ----- 7 -------- 8 ---------- 9 -----
>
>
> Here,
> * vertical link is anon_vma->head, avc->same_anon_vma link.
> * horizontal link is vma->anon_vma_chain, avc->same_vma link.
> * 1-9 are avcs.
>

I don't think this is quite right for how the "root" anon_vma is
discovered. The ordering of same_vma is such that the prev pointer
points to the root anon_vma as described in __page_set_anon_rmap() but
even so...

> When scanning pages, we may see a page whose anon_vma is anon_vma1
> or anon_vma2 or anon_vma3.
>

When we are walking the list for the anon_vma, we also hold the page
lock and what we're really interested in are ptes mapping that page.

> When we see anon_vma3 in page->mapping, we lock anon_vma1 and chase
> avc1->avc4->avc7. Then, start from vma1. Next, we visit vma2, we lock anon_vma2.
> At the last, we visit vma3 and lock anon_vma3.....And all are done under
> anon_vma1->lock. Right ?
>

assuming it's the root lock, sure.

> Hmm, one concern is
> anon_vma3 -> avc3 -> vma1 -> avc1 -> anon_vma1 chasing.
>
> What will prevent vma1 disappear right after releasling anon_vma3->lock ?
>

What does it matter if it disappeared? If it did, it was because it was torn
down, the PTEs are also gone and a user of rmap_walk should have stopped
caring. Right?

> ex)
> a1) At we chase, anon_vma3 -> avc3 -> vma1 -> anon_vma1, link was following.
>
> anon_vma1 anon_vma2 anon_vma3
> | | |
> vma1 ----- 1 -------- 2 --------- 3 -----
> | | |
> vma2 ----- 4 -------- 5 ---------- 6 -----
> | | |
> vma3 ----- 7 -------- 8 ---------- 9 -----
>
> We hold lock on anon_vma3.
>
> a2) After releasing anon_vma3 lock. vma1 can be unlinked.
>
> anon_vma1 anon_vma2 anon_vma3
> | | |
> vma1 removed.
> | | |
> vma2 ----- 4 -------- 5 ---------- 6 -----
> | | |
> vma3 ----- 7 -------- 8 ---------- 9 -----
>
> But we know anon_vma1->head is not empty, and it's accessable.
> Then, no problem for our purpose. Right ?
>

As the PTEs are also gone, I'm not seeing the problem.

> b1) Another thinking.
>
> At we chase, anon_vma3 -> avc3 -> vma1 -> anon_vma1, link was following.
>
> anon_vma1 anon_vma2 anon_vma3
> | | |
> vma1 ----- 1 -------- 2 --------- 3 -----
> | | |
> vma2 ----- 4 -------- 5 ---------- 6 -----
> | | |
> vma3 ----- 7 -------- 8 ---------- 9 -----
>
> We hold lock on anon_vma3. So,
>
> anon_vma1 anon_vma2 anon_vma3
> | | |
> vma1 ----removed -----removed ------ 3 -----
> | | |
> vma2 ----- 4 -------- 5 ---------- 6 -----
> | | |
> vma3 ----- 7 -------- 8 ---------- 9 -----
>
> we may see half-broken link while we take anon_vma3->lock. In this case,
> anon_vma1 can be caugt.
>
> Don't we need this ?
>
>
> void unlink_anon_vmas(struct vm_area_struct *vma)
> {
> struct anon_vma_chain *avc, *next;
>
> /* Unlink each anon_vma chained to the VMA. */
> - list_for_each_entry_safe_reverse(avc, next, &vma->anon_vma_chain, same_vma) {

This was meant to be list_for_each_entry_safe(....)

> + list_for_each_entry_safe_reverse(avc, next, &vma->anon_vma_chain, same_vma) {
> anon_vma_unlink(avc);
> list_del(&avc->same_vma);
> anon_vma_chain_free(avc);
> }
> }
>
> head avc should be removed last... Hmm ? I'm sorry if all are
> in correct order already.
>

I think the ordering is ok. The rmap_walk may find a situation where the
anon_vmas are being cleaned up but again as the page tables are going
away at this point, the contents of the PTEs are no longer important.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/