[PATCH 1/3] mm: make page freeing path RCU-safe

From: Borislav Petkov
Date: Sun Apr 11 2010 - 09:20:12 EST


From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

On Sat, 10 Apr 2010, Linus Torvalds wrote:
> On Sat, 10 Apr 2010, Borislav Petkov wrote:
> >
> > And I got an oops again, this time the #GP from couple of days ago.
>
> Oh damn. So the list corruption really does happen still.

Ho humm.

Maybe I'm crazy, but something started bothering me. And I started
wondering: when is the 'page->mapping' of an anonymous page actually
cleared?

The thing is, the mapping of an anonymous page is actually cleared only
when the page is _freed_, in "free_hot_cold_page()".

Now, let's think about that. And in particular, let's think about how that
relates to the freeing of the 'anon_vma' that the page->mapping points to.

The way the anon_vma is freed is when the mapping is torn down, and we do
roughly:

tlb = tlb_gather_mmu(mm,..)
..
unmap_vmas(&tlb, vma ..
..
free_pgtables()
..
tlb_finish_mmu(tlb, start, end);

and we actually unmap all the pages in "unmap_vmas()", and then _after_
unmapping all the pages we do the "unlink_anon_vmas(vma);" in
"free_pgtables()". Fine so far - the anon_vma stay around until after the
page has been happily unmapped.

But "unmapped all the pages" is _not_ actually the same as "free'd all the
pages". The actual _freeing_ of the page happens generally in
tlb_finish_mmu(), because we can free the page only after we've flushed
any TLB entries.

So what we have in that tlb_gather structure is a list of _pending_ pages
to be freed, while we already actually free'd the anon_vmas earlier!

Now, the thing is, tlb_gather_mmu() begins a preempt-safe region (because
we use a per-cpu variable), but as far as I can tell it is _not_ an
RCU-safe region.

So I think we might actually get a real RCU freeing event while this all
happens. So now the 'anon_vma' that 'page->mapping' points to has not just
been released back to the SLUB caches, the page itself might have been
released too.

I dunno. Does the above sound at all sane? Or am I just raving?

Something hacky like the above might fix it if I'm not just raving. I
really might be missing something here.

Linus
---
include/asm-generic/tlb.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index e43f976..2678118 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -14,6 +14,7 @@
#define _ASM_GENERIC__TLB_H

#include <linux/swap.h>
+#include <linux/rcupdate.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>

@@ -62,6 +63,7 @@ tlb_gather_mmu(struct mm_struct *mm, unsigned int full_mm_flush)

tlb->fullmm = full_mm_flush;

+ rcu_read_lock();
return tlb;
}

@@ -90,6 +92,7 @@ tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
/* keep the page table cache within bounds */
check_pgt_cache();

+ rcu_read_unlock();
put_cpu_var(mmu_gathers);
}

--
1.7.0.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/