Re: [PATCH 3/4] sched: introduce synchronized idle injection

From: Javi Merino
Date: Mon Nov 23 2015 - 12:56:54 EST


On Fri, Nov 13, 2015 at 11:53:06AM -0800, Jacob Pan wrote:
> With increasingly constrained power and thermal budget, it's often
> necessary to cap power via throttling. Throttling individual CPUs
> or devices at random times can help power capping but may not be
> optimal in terms of energy efficiency. Frequency scaling is also
> limited by certain range before losing energy efficiency.
>
> In general, the optimal solution in terms of energy efficiency is
> to align idle periods such that more shared circuits can be power
> gated to enter lower power states. Combined with energy efficient
> frequency point, idle injection provides a way to scale power and
> performance efficiently.
>
> This patch introduces a scheduler based idle injection method, it
> works by blocking CFS runqueue synchronously and periodically. The
> actions on all online CPUs are orchestrated by per CPU hrtimers.
>
> Two sysctl knobs are given to the userspace for selecting the
> percentage of idle time as well as the forced idle duration for each
> idle period injected.
>
> Since only CFS class is targeted, other high priority tasks are not
> affected, such as EDF and RT tasks as well as softirq and interrupts.
>
> Hotpath in CFS pick_next_task is optimized by Peter Zijlstra, where
> a new runnable flag is introduced to combine forced idle and
> nr_running.
>
> Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> ---
> include/linux/sched.h | 11 ++
> include/linux/sched/sysctl.h | 5 +
> init/Kconfig | 10 ++
> kernel/sched/fair.c | 353 ++++++++++++++++++++++++++++++++++++++++++-
> kernel/sched/sched.h | 54 ++++++-
> kernel/sysctl.c | 21 +++
> 6 files changed, 449 insertions(+), 5 deletions(-)

I've tested this series on Juno (2xCortex-A57 4xCortex-A53). If you
idle inject for 50% of the time, when I run 6 busy loops the scheduler
sometimes keeps two of them in the same cpu while the another cpu is
completely idle. Without idle injection the scheduler does the
sensible thing: put one busy loop in each CPU. I'm running systemd
and this only happens with CONFIG_SCHED_AUTOGROUP=y. If I unset
CONFIG_SCHED_AUTOGROUP, the tasks are spread across all cpus as usual.

See below part of the trace that shows this problem. CPU3 has two
100% tasks: 1554 and 1549 but the scheduler never moves one of the
tasks to CPU4, which has an empty runqueue. Both cpus are in the same
domain. Juri helped me add two additional trace points to track the
load of a task and cpu. This tracepoints are added at the end of
update_load_avg().

<idle>-0 [002] 164.739796: sched_cfs_idle_inject_timer: throttled=0
<idle>-0 [000] 164.739797: sched_cfs_idle_inject_timer: throttled=0
<idle>-0 [005] 164.739797: sched_cfs_idle_inject_timer: throttled=0
<idle>-0 [001] 164.739797: sched_cfs_idle_inject_timer: throttled=0
<idle>-0 [003] 164.739797: sched_cfs_idle_inject_timer: throttled=0
<idle>-0 [004] 164.739798: sched_cfs_idle_inject_timer: throttled=0
<idle>-0 [002] 164.739802: sched_load_avg_cpu: cpu=2 load_avg=171 util_avg=406
<idle>-0 [002] 164.739803: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1006 util_avg=400 load_sum=48043453 util_sum=19130537 period_contrib=173
<idle>-0 [001] 164.739803: sched_load_avg_cpu: cpu=1 load_avg=170 util_avg=405
<idle>-0 [002] 164.739804: sched_load_avg_cpu: cpu=2 load_avg=1014 util_avg=403
<idle>-0 [001] 164.739804: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1008 util_avg=401 load_sum=48161276 util_sum=19177731 period_contrib=288
<idle>-0 [005] 164.739804: sched_load_avg_cpu: cpu=5 load_avg=169 util_avg=404
<idle>-0 [002] 164.739805: sched_switch: swapper/2:0 [120] R ==> busy_loop:1552 [120]
<idle>-0 [001] 164.739805: sched_load_avg_cpu: cpu=1 load_avg=1024 util_avg=407
<idle>-0 [003] 164.739805: sched_load_avg_cpu: cpu=3 load_avg=340 util_avg=405
<idle>-0 [000] 164.739805: sched_load_avg_cpu: cpu=0 load_avg=168 util_avg=400
<idle>-0 [001] 164.739806: sched_switch: swapper/1:0 [120] R ==> busy_loop:1551 [120]
<idle>-0 [005] 164.739806: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1010 util_avg=402 load_sum=48229881 util_sum=19205027 period_contrib=355
<idle>-0 [003] 164.739807: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1012 util_avg=193 load_sum=48316673 util_sum=9247244 period_contrib=441
<idle>-0 [000] 164.739807: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1005 util_avg=400 load_sum=48003551 util_sum=19119112 period_contrib=134
<idle>-0 [005] 164.739808: sched_load_avg_cpu: cpu=5 load_avg=1002 util_avg=399
<idle>-0 [003] 164.739808: sched_load_avg_cpu: cpu=3 load_avg=2045 util_avg=407
<idle>-0 [000] 164.739809: sched_load_avg_cpu: cpu=0 load_avg=1008 util_avg=401
<idle>-0 [005] 164.739810: sched_switch: swapper/5:0 [120] R ==> busy_loop:1550 [120]
<idle>-0 [003] 164.739810: sched_switch: swapper/3:0 [120] R ==> busy_loop:1549 [120]
<idle>-0 [000] 164.739811: sched_switch: swapper/0:0 [120] R ==> busy_loop:1553 [120]
busy_loop-1552 [002] 164.743793: sched_stat_runtime: comm=busy_loop pid=1552 runtime=3991560 [ns] vruntime=605432548 [ns]
busy_loop-1549 [003] 164.743794: sched_stat_runtime: comm=busy_loop pid=1549 runtime=3990040 [ns] vruntime=382380848 [ns]
busy_loop-1552 [002] 164.743794: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1024 util_avg=456 load_sum=48889883 util_sum=21796057 period_contrib=999
busy_loop-1553 [000] 164.743794: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3990180 [ns] vruntime=590391894 [ns]
busy_loop-1551 [001] 164.743794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=3992100 [ns] vruntime=272056341 [ns]
busy_loop-1550 [005] 164.743794: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3990920 [ns] vruntime=198320034 [ns]
busy_loop-1552 [002] 164.743795: sched_load_avg_cpu: cpu=2 load_avg=1010 util_avg=450
busy_loop-1551 [001] 164.743796: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1004 util_avg=447 load_sum=47958941 util_sum=21380913 period_contrib=90
busy_loop-1549 [003] 164.743796: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1007 util_avg=257 load_sum=48112396 util_sum=12285572 period_contrib=241
busy_loop-1552 [002] 164.743796: sched_load_avg_cpu: cpu=2 load_avg=170 util_avg=453
busy_loop-1553 [000] 164.743796: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1023 util_avg=456 load_sum=48847931 util_sum=21780791 period_contrib=958
busy_loop-1551 [001] 164.743796: sched_load_avg_cpu: cpu=1 load_avg=1020 util_avg=454
busy_loop-1550 [005] 164.743797: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1005 util_avg=448 load_sum=48026522 util_sum=21410614 period_contrib=156
busy_loop-1549 [003] 164.743797: sched_load_avg_cpu: cpu=3 load_avg=2036 util_avg=454
busy_loop-1553 [000] 164.743798: sched_load_avg_cpu: cpu=0 load_avg=1004 util_avg=447
busy_loop-1551 [001] 164.743798: sched_load_avg_cpu: cpu=1 load_avg=169 util_avg=452
busy_loop-1550 [005] 164.743798: sched_load_avg_cpu: cpu=5 load_avg=1020 util_avg=455
busy_loop-1553 [000] 164.743800: sched_load_avg_cpu: cpu=0 load_avg=171 util_avg=456
busy_loop-1549 [003] 164.743800: sched_load_avg_cpu: cpu=3 load_avg=339 util_avg=452
busy_loop-1550 [005] 164.743800: sched_load_avg_cpu: cpu=5 load_avg=168 util_avg=450
busy_loop-1552 [002] 164.747792: sched_stat_runtime: comm=busy_loop pid=1552 runtime=3999320 [ns] vruntime=609431868 [ns]
busy_loop-1553 [000] 164.747793: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3999380 [ns] vruntime=594391274 [ns]
busy_loop-1549 [003] 164.747793: sched_stat_runtime: comm=busy_loop pid=1549 runtime=3999540 [ns] vruntime=386380388 [ns]
busy_loop-1552 [002] 164.747794: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1019 util_avg=499 load_sum=48694671 util_sum=23849523 period_contrib=808
busy_loop-1551 [001] 164.747794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=3999880 [ns] vruntime=276056221 [ns]
busy_loop-1550 [005] 164.747795: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3999280 [ns] vruntime=202319314 [ns]
busy_loop-1552 [002] 164.747795: sched_load_avg_cpu: cpu=2 load_avg=1006 util_avg=492
busy_loop-1551 [001] 164.747795: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1022 util_avg=500 load_sum=48813533 util_sum=23907693 period_contrib=924
busy_loop-1553 [000] 164.747795: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1019 util_avg=499 load_sum=48652717 util_sum=23832040 period_contrib=767
busy_loop-1549 [003] 164.747796: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1003 util_avg=315 load_sum=47917292 util_sum=15063949 period_contrib=50
busy_loop-1551 [001] 164.747796: sched_load_avg_cpu: cpu=1 load_avg=1016 util_avg=497
busy_loop-1552 [002] 164.747796: sched_load_avg_cpu: cpu=2 load_avg=169 util_avg=496
busy_loop-1550 [005] 164.747797: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1023 util_avg=501 load_sum=48880090 util_sum=23938753 period_contrib=989
busy_loop-1553 [000] 164.747797: sched_load_avg_cpu: cpu=0 load_avg=1022 util_avg=500
busy_loop-1549 [003] 164.747797: sched_load_avg_cpu: cpu=3 load_avg=2028 util_avg=496
busy_loop-1551 [001] 164.747797: sched_load_avg_cpu: cpu=1 load_avg=169 util_avg=495
busy_loop-1550 [005] 164.747798: sched_load_avg_cpu: cpu=5 load_avg=1016 util_avg=497
busy_loop-1553 [000] 164.747799: sched_load_avg_cpu: cpu=0 load_avg=170 util_avg=499
busy_loop-1549 [003] 164.747800: sched_load_avg_cpu: cpu=3 load_avg=337 util_avg=494
busy_loop-1550 [005] 164.747800: sched_load_avg_cpu: cpu=5 load_avg=168 util_avg=492
busy_loop-1552 [002] 164.751792: sched_stat_runtime: comm=busy_loop pid=1552 runtime=4000260 [ns] vruntime=613432128 [ns]
busy_loop-1549 [003] 164.751793: sched_stat_runtime: comm=busy_loop pid=1549 runtime=3999760 [ns] vruntime=390380148 [ns]
busy_loop-1553 [000] 164.751793: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3999920 [ns] vruntime=598391194 [ns]
busy_loop-1552 [002] 164.751793: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1015 util_avg=538 load_sum=48500452 util_sum=25717351 period_contrib=618
busy_loop-1550 [005] 164.751793: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3999920 [ns] vruntime=206319234 [ns]
busy_loop-1552 [002] 164.751794: sched_load_avg_cpu: cpu=2 load_avg=1024 util_avg=542
busy_loop-1551 [001] 164.751794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=4000120 [ns] vruntime=280056341 [ns]
busy_loop-1549 [003] 164.751795: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1021 util_avg=376 load_sum=48771927 util_sum=17985591 period_contrib=884
busy_loop-1553 [000] 164.751795: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1015 util_avg=538 load_sum=48458496 util_sum=25697835 period_contrib=577
busy_loop-1551 [001] 164.751795: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1018 util_avg=539 load_sum=48619308 util_sum=25780552 period_contrib=734
busy_loop-1550 [005] 164.751795: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1019 util_avg=540 load_sum=48685865 util_sum=25814558 period_contrib=799
busy_loop-1552 [002] 164.751796: sched_load_avg_cpu: cpu=2 load_avg=169 util_avg=535
busy_loop-1551 [001] 164.751796: sched_load_avg_cpu: cpu=1 load_avg=1011 util_avg=536
busy_loop-1553 [000] 164.751797: sched_load_avg_cpu: cpu=0 load_avg=1018 util_avg=539
busy_loop-1549 [003] 164.751797: sched_load_avg_cpu: cpu=3 load_avg=2020 util_avg=535
busy_loop-1550 [005] 164.751797: sched_load_avg_cpu: cpu=5 load_avg=1012 util_avg=536
busy_loop-1551 [001] 164.751797: sched_load_avg_cpu: cpu=1 load_avg=168 util_avg=533
busy_loop-1553 [000] 164.751799: sched_load_avg_cpu: cpu=0 load_avg=169 util_avg=538
busy_loop-1549 [003] 164.751799: sched_load_avg_cpu: cpu=3 load_avg=336 util_avg=533
busy_loop-1550 [005] 164.751800: sched_load_avg_cpu: cpu=5 load_avg=171 util_avg=543
busy_loop-1549 [003] 164.751807: sched_stat_runtime: comm=busy_loop pid=1549 runtime=13700 [ns] vruntime=390393848 [ns]
busy_loop-1549 [003] 164.751809: sched_load_avg_task: comm=busy_loop pid=1549 cpu=3 load_avg=1021 util_avg=376 load_sum=48785239 util_sum=17998903 period_contrib=897
busy_loop-1549 [003] 164.751811: sched_load_avg_cpu: cpu=3 load_avg=2020 util_avg=535
busy_loop-1549 [003] 164.751812: sched_load_avg_task: comm=busy_loop pid=1554 cpu=3 load_avg=1015 util_avg=163 load_sum=48472554 util_sum=7827475 period_contrib=593
busy_loop-1549 [003] 164.751814: sched_load_avg_cpu: cpu=3 load_avg=2020 util_avg=535
busy_loop-1549 [003] 164.751816: sched_switch: busy_loop:1549 [120] R ==> busy_loop:1554 [120]
busy_loop-1552 [002] 164.755792: sched_stat_runtime: comm=busy_loop pid=1552 runtime=3999800 [ns] vruntime=617431928 [ns]
busy_loop-1553 [000] 164.755793: sched_stat_runtime: comm=busy_loop pid=1553 runtime=3999880 [ns] vruntime=602391074 [ns]
busy_loop-1552 [002] 164.755793: sched_load_avg_task: comm=busy_loop pid=1552 cpu=2 load_avg=1011 util_avg=574 load_sum=48306205 util_sum=27414009 period_contrib=428
busy_loop-1550 [005] 164.755793: sched_stat_runtime: comm=busy_loop pid=1550 runtime=3999780 [ns] vruntime=210319014 [ns]
busy_loop-1554 [003] 164.755793: sched_stat_runtime: comm=busy_loop pid=1554 runtime=3986540 [ns] vruntime=382907621 [ns]
busy_loop-1552 [002] 164.755794: sched_load_avg_cpu: cpu=2 load_avg=1019 util_avg=578
busy_loop-1551 [001] 164.755794: sched_stat_runtime: comm=busy_loop pid=1551 runtime=3999860 [ns] vruntime=284056201 [ns]
busy_loop-1553 [000] 164.755795: sched_load_avg_task: comm=busy_loop pid=1553 cpu=0 load_avg=1010 util_avg=573 load_sum=48264247 util_sum=27392629 period_contrib=387
busy_loop-1551 [001] 164.755795: sched_load_avg_task: comm=busy_loop pid=1551 cpu=1 load_avg=1014 util_avg=575 load_sum=48425055 util_sum=27481823 period_contrib=544
busy_loop-1552 [002] 164.755795: sched_load_avg_cpu: cpu=2 load_avg=168 util_avg=570
busy_loop-1550 [005] 164.755795: sched_load_avg_task: comm=busy_loop pid=1550 cpu=5 load_avg=1015 util_avg=576 load_sum=48491612 util_sum=27518531 period_contrib=609
busy_loop-1554 [003] 164.755796: sched_load_avg_task: comm=busy_loop pid=1554 cpu=3 load_avg=1010 util_avg=230 load_sum=48265186 util_sum=10993484 period_contrib=390
busy_loop-1551 [001] 164.755796: sched_load_avg_cpu: cpu=1 load_avg=1007 util_avg=571
busy_loop-1553 [000] 164.755796: sched_load_avg_cpu: cpu=0 load_avg=1014 util_avg=575
busy_loop-1550 [005] 164.755797: sched_load_avg_cpu: cpu=5 load_avg=1008 util_avg=572
busy_loop-1554 [003] 164.755797: sched_load_avg_cpu: cpu=3 load_avg=2012 util_avg=571
busy_loop-1551 [001] 164.755797: sched_load_avg_cpu: cpu=1 load_avg=171 util_avg=581
busy_loop-1553 [000] 164.755799: sched_load_avg_cpu: cpu=0 load_avg=168 util_avg=574
busy_loop-1550 [005] 164.755799: sched_load_avg_cpu: cpu=5 load_avg=170 util_avg=579
busy_loop-1554 [003] 164.755799: sched_load_avg_cpu: cpu=3 load_avg=342 util_avg=581
busy_loop-1552 [002] 164.759791: sched_cfs_idle_inject_timer: throttled=1
busy_loop-1551 [001] 164.759791: sched_cfs_idle_inject_timer: throttled=1
busy_loop-1550 [005] 164.759792: sched_cfs_idle_inject_timer: throttled=1
busy_loop-1554 [003] 164.759792: sched_cfs_idle_inject_timer: throttled=1
busy_loop-1553 [000] 164.759792: sched_cfs_idle_inject_timer: throttled=1


Cheers,
Javi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/