Re: [patch v6 0/21] sched: power aware scheduling

From: Alex Shi
Date: Wed Apr 03 2013 - 20:58:10 EST


On 03/30/2013 10:34 PM, Alex Shi wrote:
> This patch set implement/consummate the rough power aware scheduling
> proposal: https://lkml.org/lkml/2012/8/13/139.

BTW, this task packing feature causes more cpu freq boost because part
cores idle. And since cpu freq boost is more power efficient.
that is much helpful on performance/watts. like the 16/32 thread kbuild
results show:

powersaving performance
> x = 2 189.416 /228 23 193.355 /209 24
> x = 4 215.728 /132 35 219.69 /122 37
> x = 8 244.31 /75 54 252.709 /68 58
> x = 16 299.915 /43 77 259.127 /58 66
> x = 32 341.221 /35 83 323.418 /38 81
>
> data explains: 189.416 /228 23
> 189.416: average Watts during compilation
> 228: seconds(compile time)
> 23: scaled performance/watts = 1000000 / seconds / watts
>
>
> The code also on this git tree:
> https://github.com/alexshi/power-scheduling.git power-scheduling
>
> The patch defines a new policy 'powersaving', that try to pack tasks on
> each sched groups level. Then it can save much power when task number in
> system is no more than LCPU number.
>
> As mentioned in the power aware scheduling proposal, Power aware
> scheduling has 2 assumptions:
> 1, race to idle is helpful for power saving
> 2, less active sched groups will reduce cpu power consumption
>
> The first assumption make performance policy take over scheduling when
> any group is busy.
> The second assumption make power aware scheduling try to pack disperse
> tasks into fewer groups.
>
> Compare to the removed power balance, this power balance has the following
> advantages:
> 1, simpler sys interface
> only 2 sysfs interface VS 2 interface for each of LCPU
> 2, cover on all cpu topology
> effect on all domain level VS only work on SMT/MC domain
> 3, Less task migration
> mutual exclusive perf/power LB VS balance power on balanced performance
> 4, considered system load threshing
> yes VS no
> 5, transitory task considered
> yes VS no
>
> BTW, like sched numa, Power aware scheduling is also a kind of cpu
> locality oriented scheduling.
>
> Thanks comments/suggestions from PeterZ, Linus Torvalds, Andrew Morton,
> Ingo, Len Brown, Arjan, Borislav Petkov, PJT, Namhyung Kim, Mike
> Galbraith, Greg, Preeti, Morten Rasmussen, Rafael etc.
>
> Since the patch can perfect pack tasks into fewer groups, I just show
> some performance/power testing data here:
> =========================================
> $for ((i = 0; i < x; i++)) ; do while true; do :; done & done
>
> On my SNB laptop with 4 core* HT: the data is avg Watts
> powersaving performance
> x = 8 72.9482 72.6702
> x = 4 61.2737 66.7649
> x = 2 44.8491 59.0679
> x = 1 43.225 43.0638
>
> on SNB EP machine with 2 sockets * 8 cores * HT:
> powersaving performance
> x = 32 393.062 395.134
> x = 16 277.438 376.152
> x = 8 209.33 272.398
> x = 4 199 238.309
> x = 2 175.245 210.739
> x = 1 174.264 173.603
>
>
> tasks number keep waving benchmark, 'make -j <x> vmlinux'
> on my SNB EP 2 sockets machine with 8 cores * HT:
> powersaving performance
> x = 2 189.416 /228 23 193.355 /209 24
> x = 4 215.728 /132 35 219.69 /122 37
> x = 8 244.31 /75 54 252.709 /68 58
> x = 16 299.915 /43 77 259.127 /58 66
> x = 32 341.221 /35 83 323.418 /38 81
>
> data explains: 189.416 /228 23
> 189.416: average Watts during compilation
> 228: seconds(compile time)
> 23: scaled performance/watts = 1000000 / seconds / watts
> The performance value of kbuild is better on threads 16/32, that's due
> to lazy power balance reduced the context switch and CPU has more boost
> chance on powersaving balance.
>
> Some performance testing results:
> ---------------------------------
>
> Tested benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> loopback netperf. on my core2, nhm, wsm, snb, platforms.
>
> results:
> A, no clear performance change found on 'performance' policy.
> B, specjbb2005 drop 5~7% on both of policy whenever with openjdk or
> jrockit on powersaving polocy
> C, hackbench drops 40% with powersaving policy on snb 4 sockets platforms.
> Others has no clear change.
>
> ===
> Changelog:
> V6 change:
> a, remove 'balance' policy.
> b, consider RT task effect in balancing
> c, use avg_idle as burst wakeup indicator
> d, balance on task utilization in fork/exec/wakeup.
> e, no power balancing on SMT domain.
>
> V5 change:
> a, change sched_policy to sched_balance_policy
> b, split fork/exec/wake power balancing into 3 patches and refresh
> commit logs
> c, others minors clean up
>
> V4 change:
> a, fix few bugs and clean up code according to Morten Rasmussen, Mike
> Galbraith and Namhyung Kim. Thanks!
> b, take Morten Rasmussen's suggestion to use different criteria for
> different policy in transitory task packing.
> c, shorter latency in power aware scheduling.
>
> V3 change:
> a, engaged nr_running and utilisation in periodic power balancing.
> b, try packing small exec/wake tasks on running cpu not idle cpu.
>
> V2 change:
> a, add lazy power scheduling to deal with kbuild like benchmark.
>
>
> -- Thanks Alex
> [patch v6 01/21] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
> [patch v6 02/21] sched: set initial value of runnable avg for new
> [patch v6 03/21] sched: only count runnable avg on cfs_rq's
> [patch v6 04/21] sched: add sched balance policies in kernel
> [patch v6 05/21] sched: add sysfs interface for sched_balance_policy
> [patch v6 06/21] sched: log the cpu utilization at rq
> [patch v6 07/21] sched: add new sg/sd_lb_stats fields for incoming
> [patch v6 08/21] sched: move sg/sd_lb_stats struct ahead
> [patch v6 09/21] sched: scale_rt_power rename and meaning change
> [patch v6 10/21] sched: get rq potential maximum utilization
> [patch v6 11/21] sched: detect wakeup burst with rq->avg_idle
> [patch v6 12/21] sched: add power aware scheduling in fork/exec/wake
> [patch v6 13/21] sched: using avg_idle to detect bursty wakeup
> [patch v6 14/21] sched: packing transitory tasks in wakeup power
> [patch v6 15/21] sched: add power/performance balance allow flag
> [patch v6 16/21] sched: pull all tasks from source group
> [patch v6 17/21] sched: no balance for prefer_sibling in power
> [patch v6 18/21] sched: add new members of sd_lb_stats
> [patch v6 19/21] sched: power aware load balance
> [patch v6 20/21] sched: lazy power balance
> [patch v6 21/21] sched: don't do power balance on share cpu power
>


--
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/