Re: [PATCH RFC 6/7] sched: cfs: cpu frequency scaling based on task placement
From: Peter Zijlstra
Date: Mon Oct 27 2014 - 11:55:37 EST
On Tue, Oct 21, 2014 at 11:07:30PM -0700, Mike Turquette wrote:
> {en,de}queue_task_fair are updated to track which cpus will have changed
> utilization values as function of task queueing. The affected cpus are
> passed on to arch_eval_cpu_freq for further machine-specific processing
> based on a selectable policy.
Yeah, I'm not sure about the arch eval hook, ideally it'd be all
integrated with the energy model.
> arch_scale_cpu_freq is called from run_rebalance_domains as a way to
> kick off the scaling process (via wake_up_process), so as to prevent
> re-entering the {en,de}queue code.
We might want a better name for that :-) dvfs_set_freq() or whatnot, or
maybe preserve the cpufreq_*() namespace, people seen to know that that
is the linux dvfs name.
> All of the call sites in this patch are up for discussion. Does it make
> sense to track which cpus have updated statistics in enqueue_fair_task?
Like I said, I don't think so, we guestimate and approximate everything
anyhow, don't bother trying to be 'perfect' here, its excessively
expensive.
> I chose this because I wanted to gather statistics for all cpus affected
> in the event CONFIG_FAIR_GROUP_SCHED is enabled. As agreed at LPC14 the
> next version of this patch will focus on the simpler case of not using
> scheduler cgroups, which should remove a good chunk of this code,
> including the cpumask stuff.
Yes please, make the cpumask stuff go away :-)
> Also discussed at LPC14 is that fact that load_balance is a very
> interesting place to do this as frequency can be considered in concert
> with task placement. Please put forth any ideas on a sensible way to do
> this.
Ideally it'd be natural fallout of Morten's energy model.
If you take a multi-core energy model, find its bifurcations and map its
solution spaces I suspect there to be a fairly small set of actual
behaviours.
The problem is, nobody seems to have done this yet so we don't know.
Once you've done this, you can try and minimize the model by proving you
retain all behaviour modes, but for now Morten has a rather full
parameter space (not complete though, and the impact of the missing
parameters might or might not be relevant, impossible to prove until we
have the above done).
> Is run_rebalance_domains a logical place to change cpu frequency? What
> other call sites make sense?
For the legacy systems, maybe.
> Even for platforms that can target a cpu frequency without sleeping
> (x86, some ARM platforms with PM microcontrollers) it is currently
> necessary to always kick the frequency target work out into a kthread.
> This is because of the rw_sem usage in the cpufreq core which might
> sleep. Replacing that lock type is probably a good idea.
I think it would be best to start with this, ideally we'd be able to RCU
free the thing such that either holding the rwsem or rcu_read_lock is
sufficient for usage, that way the sleeping muck can grab the rwsem, the
non-sleeping stuff can grab rcu_read_lock.
But I've not looked at the cpufreq stuff at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/