Use mmu_gather for fork() instead of flush_tlb_mm()
This patch uses an mmu_gather for copying page tables instead of
flush_tlb_mm(). This allows archs like ppc32 with hash table to
avoid walking the page tables a second time to invalidate hash
entries, and to only flush PTEs that have actually been changed
from RW to RO.
Note that this contain a small change to the mmu gather stuff,
it must not call free_pages_and_swap_cache() if no page have been
queued up for freeing (if we are only invalidating PTEs). Calling
it on fork can deadlock (I haven't dug why but it looks like a
good idea to test anyway if we're going to use the mmu_gather for
more than just removing pages).
If the patch gets accepted, I will split that bit from the rest
of the patch and send it separately.
The main possible issue I see is with huge pages. Arch code might
have relied on flush_tlb_mm() and might not cope with
tlb_remove_tlb_entry() called for huge PTEs.
Other possible issues are if archs make assumptions about
flush_tlb_mm() being called in fork for different unrelated reasons.
Ah also, we could probably improve the tracking of start/end, in
the case of lock breaking, the outside function will still finish
the batch with the entire range. It doesn't matter on ppc and x86
I think though.