Re: [RFCv3 PATCH 42/48] sched: Introduce energy awareness into find_busiest_queue

From: Dietmar Eggemann
Date: Tue Mar 24 2015 - 14:04:59 EST


On 24/03/15 15:21, Peter Zijlstra wrote:
On Wed, Feb 04, 2015 at 06:31:19PM +0000, Morten Rasmussen wrote:
+++ b/kernel/sched/fair.c
@@ -7216,6 +7216,37 @@ static struct rq *find_busiest_queue(struct lb_env *env,
unsigned long busiest_load = 0, busiest_capacity = 1;
int i;

+ if (env->use_ea) {
+ struct rq *costliest = NULL;
+ unsigned long costliest_usage = 1024, costliest_energy = 1;
+
+ for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
+ unsigned long usage = get_cpu_usage(i);
+ struct rq *rq = cpu_rq(i);
+ struct sched_domain *sd = rcu_dereference(rq->sd);
+ struct energy_env eenv = {
+ .sg_top = sd->groups,
+ .usage_delta = 0,
+ .src_cpu = -1,
+ .dst_cpu = -1,
+ };
+ unsigned long energy = sched_group_energy(&eenv);
+
+ /*
+ * We're looking for the minimal cpu efficiency
+ * min(u_i / e_i), crosswise multiplication leads to
+ * u_i * e_j < u_j * e_i with j as previous minimum.
+ */
+ if (usage * costliest_energy < costliest_usage * energy) {
+ costliest_usage = usage;
+ costliest_energy = energy;
+ costliest = rq;
+ }
+ }
+
+ return costliest;
+ }
+
for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
unsigned long capacity, wl;
enum fbq_type rt;

So I've thought about parametrizing the whole load balance thing to
avoid things like this.

Irrespective of whether we balance on pure load or another metric we
have the same structure, only different units plugged in.

I've not really spend too much time on it to see what it would look
like, but I think it would be a good avenue to investigate to avoid
patches like this.

Yes, although I tried to keep the EAS specific code in lb small, w/o such an abstraction the code becomes very quickly pretty ugly.

So far I see the following parameters for such a 'unit':

conv. CFS vs. EAS:

(weighted) load vs. (cpu) usage
(cpu) capacity vs. (group) energy
(per task) imbalance vs. (per task) energy diff

For me this 'unit' is very close to the existing struct lb_env. Not sure yet what to do with s[dg]_lb_stats here? So far they obviously only contain data necessary for conv. CFS.

But I assume some of the if/else conditions in the lb code path (load_balance, fbg, update_s[dg]_lb_stats, fbq, detach_tasks, acive lb) will stay 'unit' specific.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/