Re: [PATCH 0/13] Parallel struct page initialisation v4

From: Waiman Long
Date: Wed May 06 2015 - 22:37:39 EST


On 05/06/2015 01:58 PM, Waiman Long wrote:
On 05/06/2015 06:22 AM, Mel Gorman wrote:
On Wed, May 06, 2015 at 08:12:46AM +0100, Mel Gorman wrote:
On Tue, May 05, 2015 at 03:25:49PM -0700, Andrew Morton wrote:
On Tue, 5 May 2015 23:13:29 +0100 Mel Gorman<mgorman@xxxxxxx> wrote:

Alternatively, the page allocator can go off and synchronously
initialize some pageframes itself. Keep doing that until the
allocation attempt succeeds.

That was rejected during review of earlier attempts at this feature on
the grounds that it impacted allocator fast paths.
eh? Changes are only needed on the allocation-attempt-failed path,
which is slow-path.
We'd have to distinguish between falling back to other zones because the
high zone is artifically exhausted and normal ALLOC_BATCH exhaustion. We'd
also have to avoid falling back to remote nodes prematurely. While I have
not tried an implementation, I expected they would need to be in the fast
paths unless I used jump labels to get around it. I'm going to try altering
when we initialise instead so that it happens earlier.

Which looks as follows. Waiman, a test on the 24TB machine would be
appreciated again. This patch should be applied instead of "mm: meminit:
Take into account that large system caches scale linearly with memory"

---8<---
mm: meminit: Finish initialisation of memory before basic setup

Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise memory
on demand but it interferes with page allocator paths. This patch creates
dedicated threads to initialise memory before basic setup. It then blocks
on a rw_semaphore until completion as a wait_queue and counter is overkill.
This may be slower to boot but it's simplier overall and also gets rid of a
lot of section mangling which existed so kswapd could do the initialisation.

Signed-off-by: Mel Gorman<mgorman@xxxxxxx>


This patch moves the deferred meminit from kswapd to its own kernel threads started after smp_init(). However, the hash table allocation was done earlier than that. It seems like it will still run out of memory in the 24TB machine that I tested on.

I will certainly try it out, but I doubt it will solve the problem on its own.

It turns out that the two new patches did work on the 24-TB DragonHawk without the "mm: meminit: Take into account that large system caches scale linearly with memory" patch. The bootup time was 357s which was just a few seconds slower than the other bootup times that I sent you yesterday.

BTW, do you want to change the following log message as kswapd will no longer be the one doing deferred meminit?

kswapd 0 initialised 396098436 pages in 6024ms

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/