[PATCH] sched/numa: Acquire RCU lock for checking idle cores during NUMA balancing

From: Mel Gorman
Date: Thu Feb 27 2020 - 14:18:10 EST


Qian Cai reported the following

The linux-next commit ff7db0bf24db ("sched/numa: Prefer using an idle CPU as a
migration target instead of comparing tasks") introduced a boot warning,

[ 86.520534][ T1] WARNING: suspicious RCU usage
[ 86.520540][ T1] 5.6.0-rc3-next-20200227 #7 Not tainted
[ 86.520545][ T1] -----------------------------
[ 86.520551][ T1] kernel/sched/fair.c:5914 suspicious rcu_dereference_check() usage!
[ 86.520555][ T1]
[ 86.520555][ T1] other info that might help us debug this:
[ 86.520555][ T1]
[ 86.520561][ T1]
[ 86.520561][ T1] rcu_scheduler_active = 2, debug_locks = 1
[ 86.520567][ T1] 1 lock held by systemd/1:
[ 86.520571][ T1] #0: ffff8887f4b14848 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x1d2/0x998
[ 86.520594][ T1]
[ 86.520594][ T1] stack backtrace:
[ 86.520602][ T1] CPU: 1 PID: 1 Comm: systemd Not tainted 5.6.0-rc3-next-20200227 #7

task_numa_migrate() checks for idle cores when updating NUMA-related statistics.
This relies on reading a RCU-protected structure in test_idle_cores() via this
call chain

task_numa_migrate
-> update_numa_stats
-> numa_idle_core
-> test_idle_cores

While the locking could be fine-grained, it is more appropriate to acquire
the RCU lock for the entire scan of the domain. This patch removes the
warning triggered at boot time.

Fixes: ff7db0bf24db ("sched/numa: Prefer using an idle CPU as a migration target instead of comparing tasks")
Reported-by: Qian Cai <cai@xxxxxx>
Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
---
kernel/sched/fair.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10f9e6729fcf..1592b6d26239 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1595,6 +1595,7 @@ static void update_numa_stats(struct task_numa_env *env,
memset(ns, 0, sizeof(*ns));
ns->idle_cpu = -1;

+ rcu_read_lock();
for_each_cpu(cpu, cpumask_of_node(nid)) {
struct rq *rq = cpu_rq(cpu);

@@ -1614,6 +1615,7 @@ static void update_numa_stats(struct task_numa_env *env,
idle_core = numa_idle_core(idle_core, cpu);
}
}
+ rcu_read_unlock();

ns->weight = cpumask_weight(cpumask_of_node(nid));