Re: [Patch]cpuidle: Save current cpu as local variable instead of calling smp_processor_id() in loop

From: Peter Zijlstra
Date: Wed May 18 2016 - 07:47:50 EST



Much better; but the below does not in fact apply.

> ---
>
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 1214f0a..82698e5 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -185,6 +185,8 @@ exit_idle:
> */
> static void cpu_idle_loop(void)
> {
> + int cpu_id;
>
> + cpu_id = smp_processor_id();

This hunk assumes there's an empty line between the '{' and 'while
(1)', this isn't actually there.

> while (1) {
> /*
> * If the arch has a polling bit, we maintain an invariant:
> @@ -202,7 +204,7 @@ static void cpu_idle_loop(void)
> check_pgt_cache();
> rmb();
>
> - if (cpu_is_offline(smp_processor_id()))
> + if (cpu_is_offline(cpu_id))
> arch_cpu_idle_dead();

And this hunk fails to apply because the recent cpu hotplug work added
another call here.

>
> local_irq_disable();

I've taken your patch and modified it; see below.

Another optimization you could look at is removing that rmb(); I don't
actually think its needed, but you'd need to find why it was added and
then check it was in fact needed back then, and then check if it is in
fact still needed now.

Its a bit of a trek through git history, but lfence / dsb ld are
expensive instructions.

---
Subject: sched,idle: Optimize idle loop
From: "Gaurav Jindal (Gaurav Jindal)" <Gaurav.Jindal@xxxxxxxxxxxxxx>
Date: Thu, 12 May 2016 10:13:33 +0000

Currently, smp_processor_id() is used to fetch the current CPU in
cpu_idle_loop. Every time the idle thread runs, it fetches current CPU
using smp_processor_id().

Since the idle thread is per CPU, the current CPU is constant, so we
can lift the load out of the loop, saving execution cycles/time in the
loop.

x86-64:
Before patch (execution in loop):
148: 0f ae e8 lfence
14b: 65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
152: 00
153: 89 c0 mov %eax,%eax
155: 49 0f a3 04 24 bt %rax,(%r12)

After patch (execution in loop):
150: 0f ae e8 lfence
153: 4d 0f a3 34 24 bt %r14,(%r12)

ARM64:
Before patch (execution in loop):
168: d5033d9f dsb ld
16c: b9405661 ldr w1,[x19,#84]
170: 1100fc20 add w0,w1,#0x3f
174: 6b1f003f cmp w1,wzr
178: 1a81b000 csel w0,w0,w1,lt
17c: 130c7000 asr w0,w0,#6
180: 937d7c00 sbfiz x0,x0,#3,#32
184: f8606aa0 ldr x0,[x21,x0]
188: 9ac12401 lsr x1,x0,x1
18c: 36000e61 tbz w1,#0,358

After patch (execution in loop):
1a8: d50339df dsb ld
1ac: f8776ac0 ldr x0,[x22,x23]
ab0: ea18001f tst x0,x24
1b4: 54000ea0 b.eq 388

Further observance on ARM64 for 4 seconds shows that cpu_idle_loop is
called 8672 times. Shifting the code will save instructions executed
in loop and eventually time as well.

Cc: "mingo@xxxxxxxxxx" <mingo@xxxxxxxxxx>
Signed-off-by: Gaurav Jindal <gaurav.jindal@xxxxxxxxxxxxxx>
Reviewed-by: Sanjeev Yadav <sanjeev.yadav@xxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/20160512101330.GA488@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
---
kernel/sched/idle.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -201,6 +201,8 @@ static void cpuidle_idle_call(void)
*/
static void cpu_idle_loop(void)
{
+ int cpu = smp_processor_id();
+
while (1) {
/*
* If the arch has a polling bit, we maintain an invariant:
@@ -219,7 +221,7 @@ static void cpu_idle_loop(void)
check_pgt_cache();
rmb();

- if (cpu_is_offline(smp_processor_id())) {
+ if (cpu_is_offline(cpu)) {
cpuhp_report_idle_dead();
arch_cpu_idle_dead();
}