[RFC] Avoid the contention in set_cpus_allowed

From: Nathan Zimmer
Date: Thu Jun 11 2015 - 11:47:39 EST


Noticing some scaling issues at larger box sizes (64 nodes+) I found that in some
cases we are spending significant amounts of time in set_cpus_allowed_ptr.

My assumption is that it is getting stuck on migration.
So if we create the thread on the target node and restrict cpus before we start
the thread then we don't have to suffer migration.

Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Waiman Long <waiman.long@xxxxxx
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Scott Norton <scott.norton@xxxxxx>
Cc: Daniel J Blueman <daniel@xxxxxxxxxxxxx>
Signed-off-by: Nathan Zimmer <nzimmer@xxxxxxx>

---
mm/page_alloc.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f88e8c4..531f7bc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1090,16 +1090,12 @@ static int __init deferred_init_memmap(void *data)
int i, zid;
struct zone *zone;
unsigned long first_init_pfn = pgdat->first_deferred_pfn;
- const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);

if (first_init_pfn == ULONG_MAX) {
up_read(&pgdat_init_rwsem);
return 0;
}

- /* Bind memory initialisation thread to a local node if possible */
- if (!cpumask_empty(cpumask))
- set_cpus_allowed_ptr(current, cpumask);

/* Sanity check boundaries */
BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn);
@@ -1204,8 +1200,16 @@ void __init page_alloc_init_late(void)
unsigned long start = jiffies;

for_each_node_state(nid, N_MEMORY) {
+ struct task_struct *defer_task;
+ const struct cpumask *cpumask = cpumask_of_node(nid);
down_read(&pgdat_init_rwsem);
- kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
+ defer_task = kthread_create_on_node(deferred_init_memmap,
+ NODE_DATA(nid), nid, "pgdatinit%d", nid);
+ /* Bind memory initialisation thread to a local node if possible */
+ if (!cpumask_empty(cpumask))
+ set_cpus_allowed_ptr(defer_task, cpumask);
+ if (!IS_ERR(defer_task))
+ wake_up_process(defer_task);
}

/* Block until all are initialised */
--
1.8.2.1


--Kj7319i9nmIyA2yE--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/