Re: Page fault scalability patch V18: Drop first acquisition of ptl

From: Benjamin Herrenschmidt
Date: Thu Mar 03 2005 - 02:03:43 EST


On Wed, 2005-03-02 at 21:51 -0800, Christoph Lameter wrote:
> On Wed, 2 Mar 2005, David S. Miller wrote:
>
> > Actually, I guess I could do the pte_cmpxchg() stuff, but only if it's
> > used to "add" access. If the TLB miss handler races, we just go into
> > the handle_mm_fault() path unnecessarily in order to synchronize.
> >
> > However, if this pte_cmpxchg() thing is used for removing access, then
> > sparc64 can't use it. In such a case a race in the TLB handler would
> > result in using an invalid PTE. I could "spin" on some lock bit, but
> > there is no way I'm adding instructions to the carefully constructed
> > TLB miss handler assembler on sparc64 just for that :-)
>
> There is no need to provide pte_cmpxchg. If the arch does not support
> cmpxchg on ptes (CONFIG_ATOMIC_TABLE_OPS not defined)
> then it will fall back to using pte_get_and_clear while holding the
> page_table_lock to insure that the entry is not touched while performing
> the comparison.

Nah, this is wrong :)

We actually _want_ pte_cmpxchg on ppc64, because we can do the stuff,
but it requires some careful manipulation of some bits in the PTE that
are beyond linux common layer understanding :) Like the BUSY bit which
is a lock bit for arbitrating with the hash fault handler for example.

Also, if it's ever used to cmpxchg from anything but a !present PTE, it
will need additional massaging (like the COW case where we just
"replace" a PTE with set_pte). We also need to preserve some bits in
there that indicate if the PTE was in the hash table and where in the
hash so we can flush it afterward.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/