Re: [PATCH 3/3] flush_tlb_range() needs ->page_table_lock when->mmap_sem is not held
From: Benjamin Herrenschmidt
Date: Fri Mar 09 2012 - 16:06:11 EST
On Mon, 2012-03-05 at 20:53 +0000, Al Viro wrote:
> On Mon, Mar 05, 2012 at 12:30:19PM -0800, Linus Torvalds wrote:
> > Is this safe? And why does it need it? Please add more explanations.
>
> a) safety - as the matter of fact, all other callers either hold either
> ->mmap_sem (exclusive) or ->page_table_lock. flush_tlb_range() is
> called under ->page_table_lock in a lot of places, e.g.
> page_referenced_one() -> pmdp_clear_flush_young_notify() ->
> -> pmdp_clear_flush_young() -> flush_tlb_range(), with
> /* go ahead even if the pmd is pmd_trans_splitting() */
> if (pmdp_clear_flush_young_notify(vma, address, pmd))
> referenced++;
> spin_unlock(&mm->page_table_lock);
> in page_referenced_one().
>
> b) there are instances that work with page tables. See e.g.
> arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and flush_range() in there.
> The same goes for uml, with a lot more extensive playing with page tables.
Yes, we need to make sure they don't go away. Without any of these locks
page table pages may be freed... however, I don't see the page table
lock ensuring that anymore. The hugetlb_free_pgd_range implementation in
powerpc seemed to have old comments about expecting the PTL to be held
but that doesn't appear to be the case anymore.
In fact I always worry with the whole walking of page tables vs. freeing
them. We use sched RCU to delay the freeing so we -should- be ok if we
keep interrupts off on the walking side but it's fishy.
> Almost all callers are actually fine - flush_tlb_range() may have no need
> to bother playing with page tables, but it can do so safely; again, this
> caller is the sole exception - everything else either has exclusive ->mmap_sem
> on the mm in question, or mm->page_table_lock is held.
mmap_sem will protect vs. page tables freeing. page_table_lock on the
other hand...
Cheers,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/