Re: [PATCH v3 09/22] sched: compute runnable load avg in cpu_loadand cpu_avg_load_per_task

From: Alex Shi
Date: Sat Jan 05 2013 - 03:55:07 EST


On 01/05/2013 04:37 PM, Alex Shi wrote:
> They are the base values in load balance, update them with rq runnable
> load average, then the load balance will consider runnable load avg
> naturally.
>
> Signed-off-by: Alex Shi <alex.shi@xxxxxxxxx>
> ---
> kernel/sched/core.c | 8 ++++++++
> kernel/sched/fair.c | 4 ++--
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 06d27af..5feed5e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2544,7 +2544,11 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> void update_idle_cpu_load(struct rq *this_rq)
> {
> unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
> + unsigned long load = (unsigned long)this_rq->cfs.runnable_load_avg;
> +#else
> unsigned long load = this_rq->load.weight;
> +#endif
> unsigned long pending_updates;
>
> /*
> @@ -2594,7 +2598,11 @@ static void update_cpu_load_active(struct rq *this_rq)
> * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> */
> this_rq->last_load_update_tick = jiffies;
> +#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
> + __update_cpu_load(this_rq, this_rq->cfs.runnable_load_avg, 1);
> +#else
> __update_cpu_load(this_rq, this_rq->load.weight, 1);
> +#endif
>
> calc_load_account_active(this_rq);
> }
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5c545e4..84a6517 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2906,7 +2906,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> /* Used instead of source_load when we know the type == 0 */
> static unsigned long weighted_cpuload(const int cpu)
> {
> - return cpu_rq(cpu)->load.weight;
> + return (unsigned long)cpu_rq(cpu)->cfs.runnable_load_avg;

Above line change cause aim9 multitask benchmark drop about 10%
performance on many x86 machines. Profile just show there are more
cpuidle enter called.
The testing command:

#( echo $hostname ; echo test ; echo 1 ; echo 2000 ; echo 2 ; echo 2000
; echo 100 ) | ./multitask -nl

The oprofile output here:
with this patch set
101978 total 0.0134
54406 cpuidle_wrap_enter 499.1376
2098 __do_page_fault 2.0349
1976 rwsem_wake 29.0588
1824 finish_task_switch 12.4932
1560 copy_user_generic_string 24.3750
1346 clear_page_c 84.1250
1249 unmap_single_vma 0.6885
1141 copy_page_rep 71.3125
1093 anon_vma_interval_tree_insert 8.1567

3.8-rc2
68982 total 0.0090
22166 cpuidle_wrap_enter 203.3578
2188 rwsem_wake 32.1765
2136 __do_page_fault 2.0718
1920 finish_task_switch 13.1507
1724 poll_idle 15.2566
1433 copy_user_generic_string 22.3906
1237 clear_page_c 77.3125
1222 unmap_single_vma 0.6736
1053 anon_vma_interval_tree_insert 7.8582

Without load avg in periodic balancing, each cpu will weighted with all
tasks load.

with new load tracking, we just update the cfs_rq load avg with each
task at enqueue/dequeue moment, and with just update current task in
scheduler_tick. I am wondering if it's the sample is a bit rare.

What's your opinion of this, Paul?


> }
>
> /*
> @@ -2953,7 +2953,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
> unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
>
> if (nr_running)
> - return rq->load.weight / nr_running;
> + return (unsigned long)rq->cfs.runnable_load_avg / nr_running;
>
> return 0;
> }
>


--
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/