Re: [PATCH RFC 6/7] sched: cfs: cpu frequency scaling based on task placement

From: Preeti U Murthy
Date: Thu Oct 23 2014 - 00:04:56 EST


Hi Mike,

On 10/22/2014 11:37 AM, Mike Turquette wrote:
> {en,de}queue_task_fair are updated to track which cpus will have changed
> utilization values as function of task queueing. The affected cpus are
> passed on to arch_eval_cpu_freq for further machine-specific processing
> based on a selectable policy.
>
> arch_scale_cpu_freq is called from run_rebalance_domains as a way to
> kick off the scaling process (via wake_up_process), so as to prevent
> re-entering the {en,de}queue code.
>
> All of the call sites in this patch are up for discussion. Does it make
> sense to track which cpus have updated statistics in enqueue_fair_task?
> I chose this because I wanted to gather statistics for all cpus affected
> in the event CONFIG_FAIR_GROUP_SCHED is enabled. As agreed at LPC14 the

Can you explain how pstate selection can get affected by the presence of
task groups? We are after all concerned with the cpu load. So when we
enqueue/dequeue a task, we update the cpu load and pass it on for cpu
pstate scaling. How does this change if we have task groups?
I know that this issue was brought up during LPC, but I have yet not
managed to gain clarity here.

> next version of this patch will focus on the simpler case of not using
> scheduler cgroups, which should remove a good chunk of this code,
> including the cpumask stuff.
>
> Also discussed at LPC14 is that fact that load_balance is a very
> interesting place to do this as frequency can be considered in concert
> with task placement. Please put forth any ideas on a sensible way to do
> this.
>
> Is run_rebalance_domains a logical place to change cpu frequency? What
> other call sites make sense?
>
> Even for platforms that can target a cpu frequency without sleeping
> (x86, some ARM platforms with PM microcontrollers) it is currently
> necessary to always kick the frequency target work out into a kthread.
> This is because of the rw_sem usage in the cpufreq core which might
> sleep. Replacing that lock type is probably a good idea.
>
> Not-signed-off-by: Mike Turquette <mturquette@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1af6f6d..3619f63 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3999,6 +3999,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> {
> struct cfs_rq *cfs_rq;
> struct sched_entity *se = &p->se;
> + struct cpumask update_cpus;
> +
> + cpumask_clear(&update_cpus);
>
> for_each_sched_entity(se) {
> if (se->on_rq)
> @@ -4028,12 +4031,27 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>
> update_cfs_shares(cfs_rq);
> update_entity_load_avg(se, 1);
> + /* track cpus that need to be re-evaluated */
> + cpumask_set_cpu(cpu_of(rq_of(cfs_rq)), &update_cpus);

All the cfs_rqs that you iterate through here will belong to the same
rq/cpu right?

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/