Re: [PATCH 13/35] autonuma: add page structure fields

From: KOSAKI Motohiro
Date: Wed May 30 2012 - 05:41:16 EST


(5/30/12 5:06 AM), Peter Zijlstra wrote:
On Wed, 2012-05-30 at 04:25 -0400, KOSAKI Motohiro wrote:
(5/29/12 10:54 AM), Peter Zijlstra wrote:
On Tue, 2012-05-29 at 09:56 -0400, Rik van Riel wrote:
On 05/29/2012 09:16 AM, Peter Zijlstra wrote:
On Fri, 2012-05-25 at 19:02 +0200, Andrea Arcangeli wrote:

24 bytes per page.. or ~0.6% of memory gone. This is far too great a
price to pay.

At LSF/MM Rik already suggested you limit the number of pages that can
be migrated concurrently and use this to move the extra list_head out of
struct page and into a smaller amount of extra structures, reducing the
total overhead.

For THP, we should be able to track this NUMA info on a
2MB page granularity.

Yeah, but that's another x86-only feature, _IF_ we're going to do this
it must be done for all archs that have CONFIG_NUMA, thus we're stuck
with 4k (or other base page size).

Even if THP=n, we don't need 4k granularity. All modern malloc implementation have
per-thread heap (e.g. glibc call it as arena) and it is usually 1-8MB size. So, if
it is larger than 2MB, we can always use per-pmd tracking. iow, memory consumption
reduce to 1/512.

Yes, and we all know objects allocated in one thread are never shared
with other threads.. the producer-consumer pattern seems fairly popular
and will destroy your argument.

THP also strike producer-consumer pattern. But, as far as I know, people haven't observed
significant performance degression. thus I _guessed_ performance critical producer-consumer
pattern is rare. Just guess.


My suggestion is, track per-pmd (i.e. 2M size) granularity and fix glibc too (current
glibc malloc has dynamically arena size adjusting feature and then it often become
less than 2M).

The trouble with making this per pmd is that you then get the false
sharing per pmd, so if there's shared data on the 2m page you'll not
know where to put it.

I also know of some folks who did a strict per-cpu allocator based on
some kernel patches I hope to see posted sometime soon. This because if
you have many more threads than cpus the wasted space in your areas is
tremendous.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/