Re: [PATCH] mmu notifiers #v5

From: Christoph Lameter
Date: Tue Feb 05 2008 - 17:06:50 EST


On Tue, 5 Feb 2008, Andrea Arcangeli wrote:

> On Tue, Feb 05, 2008 at 10:17:41AM -0800, Christoph Lameter wrote:
> > The other approach will not have any remote ptes at that point. Why would
> > there be a coherency issue?
>
> It never happens that two threads writes to two different physical
> pages by working on the same process virtual address. This is an issue
> only for KVM which is probably ok with it but certainly you can't
> consider the dependency on the page-pin less fragile or less complex
> than my PT lock approach.

You can avoid the page-pin and the pt lock completely by zapping the
mappings at _start and then holding off new references until _end.

> > No. It only has to lock the affected range. Remote page faults can occur
> > while another part of the address space is being invalidated. The
> > complexity of locking is up to the user of the mmu notifier. A simple
> > implementation is satisfactory for the GRU right now. Should it become a
> > problem then the lock granularity can be refined without changing the API.
>
> That will make the follow_page fast path even slower if it has to
> lookup a rbtree or a list of locked ranges. Still not comparable to
> the PT lock that 1) it's zero cost and 2) it'll provide an even more
> granular scalability.

As I said the implementation is up to the caller. Not sure what
XPmem is using there but then XPmem is not using follow_page. The GRU
would be using a lightway way of locking not rbtrees.

> > Still not sure what we are talking about here.
>
> The apps using GRU/KVM never trigger large
> munmap/mremap/do_exit. You're optimizing for the irrelevant workload,
> by requiring unnecessary new locking in the GRU fast path.

Maybe that is true for KVM but certainly not true for the GRU. The GRU is
designed to manage several petabytes of memory that may be mapped by a
series of Linux instances. If a process only maps a small chunk of 4
Gigabytes then we already have to deal with 1 mio callbacks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/