Re: [PATCH 2/2] hugepage: Allow parallelization of the hugepagefault path

From: David Gibson
Date: Fri Jul 15 2011 - 21:08:48 EST


On Fri, Jul 15, 2011 at 12:52:38AM -0700, Andi Kleen wrote:
> Anton Blanchard <anton@xxxxxxxxx> writes:
>
>
> > This patch improves the situation by replacing the single mutex with a
> > table of mutexes, selected based on a hash of the address_space and
> > file offset being faulted (or mm and virtual address for MAP_PRIVATE
> > mappings).
>
> It's unclear to me how this solves the original OOM problem.
> But then you can still have early oom over all the hugepages if they
> happen to hash to different pages, can't you?

The spurious OOM case only occurs when the two processes or threads
are racing to instantiate the same page (that is the same page within
an address_space for SHARED or the same virtual address for PRIVATE).
In other cases the OOM is correct behaviour (because we really don't
have enough hugepages to satisfy the requests).

Because of the hash's construction, we're guaranteed than in the
spurious OOM case, both processes or threads will use the same mutex.

> I think it would be better to move out the clearing out of the lock,

We really can't. The lock has to be taken before we grab a page from
the pool, and can't be released until after the page is "committed"
either by updating the address space's radix tree (SHARED) or the page
tables (PRIVATE). I can't see anyway the clearing can be moved out of
that.

> and possibly take the lock only when the hugepages are about to
> go OOM.

This is much easier said than done.

At one stage I did attempt a more theoretically elegant approach which
is to keep a count of the number of "in-flight" hugepages - OOMs
should be retried if it is non-zero. I believe that approach can
work, but it turns out to be pretty darn hairy to implement.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/