[RFC PATCH] sched/fair: Refactor can_migrate_task() to elimate looping
From: I Hsin Cheng
Date: Wed Jan 08 2025 - 12:30:04 EST
The function "can_migrate_task()" utilize "for_each_cpu_and" with a
"if" statement inside to find the destination cpu. It's the same logic
to find the first set bit of the result of the bitwise-AND of
"env->dst_grpmask", "env->cpus" and "p->cpus_ptr".
Refactor it by directly perform bitwise-AND for "env->dst_grpmask",
"env->cpus" and "p->cpus_ptr" and use "cpumask_first()" to select the
destination cpu, so we can elimate the need of looping and multiple
times of branch.
After the refactoring this part of the code can speed up from ~130ns to
~93ns, according to the test below.
Ran the test for 5 times and the result is showned in the following
table, and the test script is paste in next section.
-------------------------------------------------------
|Old method| 123| 135| 126| 129| 135| avg ~130ns|
-------------------------------------------------------
|New method| 87| 97| 90| 92| 98| avg ~93ns|
-------------------------------------------------------
Signed-off-by: I Hsin Cheng <richard120310@xxxxxxxxx>
---
Test is done on Linux 6.8.0-48-generic x86_64 with
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Test is executed in the form of kernel module.
Test script:
int init_module(void)
{
struct cpumask cur_mask, custom_mask, result_mask;
struct task_struct *p = current;
int cpu, cpu1 = nr_cpu_ids, cpu2 = nr_cpu_ids;
unsigned tmp = 0;
cpumask_copy(&cur_mask, cpu_online_mask);
/* Self-implemented function, didn't paste here because the length */
generate_random_cpumask(&custom_mask);
ktime_t start_1 = ktime_get();
for_each_cpu_and(cpu, &cur_mask, &custom_mask) {
if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
/* imitate load balance operation */
tmp |= 0x01010101;
cpu1 = cpu;
break;
}
}
ktime_t end_1 = ktime_get();
ktime_t start_2 = ktime_get();
cpumask_and(&result_mask, &cur_mask, &custom_mask);
cpumask_and(&result_mask, &result_mask, p->cpus_ptr);
cpu = cpumask_first(&result_mask);
if (cpu < nr_cpu_ids) {
/* imitate load balance operation */
tmp |= 0x01010101;
cpu2 = cpu;
}
ktime_t end_2 = ktime_get();
if (cpu1 != cpu2) {
pr_err("Failed Assertion, cpu1 = %d, cpu2 = %d\n", cpu1, cpu2);
return 0;
}
pr_info("Old method spend time : %lld\n", ktime_to_ns(end_1 - start_1));
pr_info("New method spend time : %lld\n", ktime_to_ns(end_2 - start_2));
return 0;
}
---
kernel/sched/fair.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d16c8545..ce46f61da 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9404,12 +9404,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
return 0;
/* Prevent to re-select dst_cpu via env's CPUs: */
- for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
- if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
- env->flags |= LBF_DST_PINNED;
- env->new_dst_cpu = cpu;
- break;
- }
+ struct cpumask dst_mask;
+
+ cpumask_and(&dst_mask, env->dst_grpmask, env->cpus);
+ cpumask_and(&dst_mask, &dst_mask, p->cpus_ptr);
+
+ cpu = cpumask_first(&dst_mask);
+
+ if (cpu < nr_cpu_ids) {
+ env->flags |= LBF_DST_PINNED;
+ env->new_dst_cpu = cpu;
}
return 0;
--
2.43.0