Re: [PATCH 2/2] mm,migration: Avoid race between shift_arg_pages()and rmap_walk() during migration by not migrating temporary stacks

From: Mel Gorman
Date: Tue May 04 2010 - 06:32:47 EST


On Thu, Apr 29, 2010 at 06:21:20PM +0200, Andrea Arcangeli wrote:
> Hi Mel,
>
> did you see my proposed fix?

I did when I got back - sorry for the delay. The patchset I sent out was what
I had fully tested and was confident worked. I picked up the version of the
patch that was sent to Linus by Rik for merging.

> I'm running with it applied, I'd be
> interested if you can test it.

Unfortunately, the same bug triggers after about 18 minutes. The objective of
your fix is very simple - have a VMA covering the new range so that rmap can
find it. However, no lock is held during move_page_tables() because of the
need to call the page allocator. Due to the lack of locking, is it possible
that something like the following is happening?

Exec Process Migration Process
begin move_page_tables
begin rmap walk
take anon_vma locks
find new location of pte (do nothing)
copy migration pte to new location
#### Bad PTE now in place
find old location of pte
remove old migration pte
release anon_vma locks
remove temporary VMA
some time later, bug on migration pte

Even with the care taken, a migration PTE got copied and then left behind. What
I haven't confirmed at this point is if the ordering of the walk in "migration
process" is correct in the above scenario. The order is important for
the race as described to happen.

If the above is wrong, there is still a race somewhere else.

> Surely it will also work for new
> anon-vma code in upstream, because at that point there's just 1
> anon-vma and nothing else attached to the vma.
>
> http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=commit;h=6efa1dfa5152ef8d7f26beb188d6877525a9dd03
>
> I think it's wrong to try to handle the race in rmap walk by making
> magic checks on vm_flags VM_GROWSDOWN|GROWSUP and
> vma->vm_mm->map_count == 1,

How bad is that magic check really? Is there a scenario when it's
the wrong thing to do?

I agree that migration skipping specific pages of the temporary stack is
unfortunate and having exec-aware informtion in migration is an odd dependency
at best. On the other hand, it's not as bad as skipping other regions as exec
will finish and allow the pages to be moved again. The impact to compaction
or transparent support would appear to be minimal.

> when we can fix it fully and simply in
> exec.c by indexing two vmas in the same anon-vma with a different
> vm_start so the pages will be found at all times by the rmap_walk.
>

If it can be simply fixed in exec, then I'll agree. Your patch looked simple
but unfortunately it doesn't fix the problem and it does introduce another
call to kmalloc() in the exec path. It's probably something that would only
be noticed by microbenchmarks though so I'm less concerned about that aspect.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/