On Thu, 7 May 2015 08:25:18 +0100 Mel Gorman <mgorman@xxxxxxx> wrote:
Waiman Long reported that 24TB machines hit OOM during basic setup whenSeems a reasonable compromise. It makes a bit of a mess of the patch
struct page initialisation was deferred. One approach is to initialise memory
on demand but it interferes with page allocator paths. This patch creates
dedicated threads to initialise memory before basic setup. It then blocks
on a rw_semaphore until completion as a wait_queue and counter is overkill.
This may be slower to boot but it's simplier overall and also gets rid of a
section mangling which existed so kswapd could do the initialisation.
sequencing.
Have some tweaklets:
From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Subject: mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast
Cc: Daniel J Blueman <daniel@xxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Nathan Zimmer <nzimmer@xxxxxxx>
Cc: Scott Norton <scott.norton@xxxxxx>
Cc: Waiman Long <waiman.long@xxxxxx
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---
mm/page_alloc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff -puN mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix
+++ a/mm/page_alloc.c
@@ -18,6 +18,7 @@
#include <linux/mm.h>
#include <linux/swap.h>
#include <linux/interrupt.h>
+#include <linux/rwsem.h>
#include <linux/pagemap.h>
#include <linux/jiffies.h>
#include <linux/bootmem.h>
@@ -1075,12 +1076,12 @@ static void __init deferred_free_range(s
__free_pages_boot_core(page, pfn, 0);
}
-static struct rw_semaphore __initdata pgdat_init_rwsem;
+static __initdata DECLARE_RWSEM(pgdat_init_rwsem);
/* Initialise remaining memory on a node */
static int __init deferred_init_memmap(void *data)
{
- pg_data_t *pgdat = (pg_data_t *)data;
+ pg_data_t *pgdat = data;
int nid = pgdat->node_id;
struct mminit_pfnnid_cache nid_init_state = { };
unsigned long start = jiffies;
@@ -1096,7 +1097,7 @@ static int __init deferred_init_memmap(v
return 0;
}
- /* Bound memory initialisation to a local node if possible */
+ /* Bind memory initialisation thread to a local node if possible */
if (!cpumask_empty(cpumask))
set_cpus_allowed_ptr(current, cpumask);
@@ -1200,7 +1201,6 @@ void __init page_alloc_init_late(void)
{
int nid;
- init_rwsem(&pgdat_init_rwsem);
for_each_node_state(nid, N_MEMORY) {
down_read(&pgdat_init_rwsem);
kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
_