Re: [Patch 3/3] prepopulate/cache cleared pages

From: Nick Piggin
Date: Fri Feb 24 2006 - 01:34:41 EST


Ingo Molnar wrote:
* Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:


I'm worried about the situation where we allocate but don't use the new page: it blows quite a bit of cache. Then, when we do get around to using it, it will be cold(er).


couldnt the new pte be flipped in atomically via cmpxchg? That way we could do the page clearing close to where we are doing it now, but without holding the mmap_sem.


We have nothing to pin the pte page with if we're not holding the
mmap_sem.

to solve the pte races we could use a bit in the [otherwise empty] pte to signal "this pte can be flipped in from now on", which bit would automatically be cleared if mprotect() or munmap() is called over that range (without any extra changes to those codepaths). (in the rare case if the cmpxchg() fails, we go into a slowpath that drops the newly allocated page, re-lookups the vma and the pte, etc.)


Page still isn't pinned. You might be able to do something wild like
disable preemption and interrupts (to stop the TLB IPI) to get a pin
on the pte pages.

But even in that case, there is nothing in the mmu gather / tlb flush
interface that guarantees an architecture cannot free the page table
pages immediately (ie without waiting for the flush IPI). This would
make sense on architectures that don't walk the page tables in hardware.

Arjan, just to get an idea of your workload: obviously it is a mix of
read and write on the mmap_sem (read only will not really benefit from
reducing lock width because cacheline transfers will still be there).
Is it coming from brk() from the allocator? Someone told me a while ago
that glibc doesn't have a decent amount of hysteresis in its allocator
and tends to enter the kernel quite a lot... that might be something
to look into.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/