[PATCH] sched/fair: remove redundant test_idle_cores for non-smt

From: Barry Song
Date: Sat Mar 20 2021 - 18:28:18 EST


update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
smt.
this could contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no smt. this patch removes the
redundant test_idle_cores() for non-smt machines.

we run the below hackbench with different -g parameter from 2 to
14, for each different g, we run the command 10 times and get the
average time:
$ numactl -N 0 hackbench -p -T -l 20000 -g $1

hackbench will report the time which is needed to complete a certain
number of messages transmissions between a certain number of tasks,
for example:
$ numactl -N 0 hackbench -p -T -l 20000 -g 10
Running in threaded mode with 10 groups using 40 file descriptors each
(== 400 tasks)
Each sender will pass 20000 messages of 100 bytes

The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%

Signed-off-by: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e2ab1e..de42a32 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6038,9 +6038,11 @@ static inline bool test_idle_cores(int cpu, bool def)
{
struct sched_domain_shared *sds;

- sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds)
- return READ_ONCE(sds->has_idle_cores);
+ if (static_branch_likely(&sched_smt_present)) {
+ sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (sds)
+ return READ_ONCE(sds->has_idle_cores);
+ }

return def;
}
--
1.8.3.1