Re: [PATCH 09/13] sched: Add bandwidth management for sched_dl

From: Peter Zijlstra
Date: Fri Dec 20 2013 - 12:14:15 EST


On Wed, Dec 18, 2013 at 05:55:08PM +0100, Peter Zijlstra wrote:

> If the purpose is to fail hotplug because taking out the CPU would end
> up in over-subscription, then we need a DOWN_PREPARE handler.

Juri just said (on IRC) that that was indeed the intended purpose.

---
Subject: sched, deadline: Fix hotplug admission control
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Thu Dec 19 11:54:45 CET 2013

The current hotplug admission control is broken because:

CPU_DYING -> migration_call() -> migrate_tasks() -> __migrate_task()

cannot fail and hard assumes it _will_ move all tasks off of the dying
cpu, failing this will break hotplug.

The much simpler solution is a DOWN_PREPARE handler that fails when
removing one CPU gets us below the total allocated bandwidth.

Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
---
kernel/sched/core.c | 68 ++++++++++++++++------------------------------------
1 file changed, 21 insertions(+), 47 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1886,9 +1886,9 @@ inline struct dl_bw *dl_bw_of(int i)
return &cpu_rq(i)->rd->dl_bw;
}

-static inline int __dl_span_weight(struct rq *rq)
+static inline int dl_bw_cpus(int i)
{
- return cpumask_weight(rq->rd->span);
+ return cpumask_weight(cpu_rq(i)->rd->span);
}
#else
inline struct dl_bw *dl_bw_of(int i)
@@ -1896,7 +1896,7 @@ inline struct dl_bw *dl_bw_of(int i)
return &cpu_rq(i)->dl.dl_bw;
}

-static inline int __dl_span_weight(struct rq *rq)
+static inline int dl_bw_cpus(int i)
{
return 1;
}
@@ -1937,7 +1937,7 @@ static int dl_overflow(struct task_struc
u64 period = attr->sched_period;
u64 runtime = attr->sched_runtime;
u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
- int cpus = __dl_span_weight(task_rq(p));
+ int cpus = dl_bw_cpus(task_cpu(p));
int err = -1;

if (new_bw == p->dl.dl_bw)
@@ -4523,42 +4523,6 @@ int set_cpus_allowed_ptr(struct task_str
EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);

/*
- * When dealing with a -deadline task, we have to check if moving it to
- * a new CPU is possible or not. In fact, this is only true iff there
- * is enough bandwidth available on such CPU, otherwise we want the
- * whole migration procedure to fail over.
- */
-static inline
-bool set_task_cpu_dl(struct task_struct *p, unsigned int cpu)
-{
- struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
- struct dl_bw *cpu_b = dl_bw_of(cpu);
- int ret = 1;
- u64 bw;
-
- if (dl_b == cpu_b)
- return 1;
-
- raw_spin_lock(&dl_b->lock);
- raw_spin_lock(&cpu_b->lock);
-
- bw = cpu_b->bw * cpumask_weight(cpu_rq(cpu)->rd->span);
- if (dl_bandwidth_enabled() &&
- bw < cpu_b->total_bw + p->dl.dl_bw) {
- ret = 0;
- goto unlock;
- }
- dl_b->total_bw -= p->dl.dl_bw;
- cpu_b->total_bw += p->dl.dl_bw;
-
-unlock:
- raw_spin_unlock(&cpu_b->lock);
- raw_spin_unlock(&dl_b->lock);
-
- return ret;
-}
-
-/*
* Move (not current) task off this cpu, onto dest cpu. We're doing
* this because either it can't run here any more (set_cpus_allowed()
* away from this CPU, or CPU going down), or because we're
@@ -4590,13 +4554,6 @@ static int __migrate_task(struct task_st
goto fail;

/*
- * If p is -deadline, proceed only if there is enough
- * bandwidth available on dest_cpu
- */
- if (unlikely(dl_task(p)) && !set_task_cpu_dl(p, dest_cpu))
- goto fail;
-
- /*
* If we're not on a rq, the next wake-up will ensure we're
* placed properly.
*/
@@ -4985,6 +4942,23 @@ migration_call(struct notifier_block *nf
unsigned long flags;
struct rq *rq = cpu_rq(cpu);

+ switch (action) {
+ case CPU_DOWN_PREPARE: /* explicitly allow suspend */
+ {
+ struct dl_bw *dl_b = dl_bw_of(cpu);
+ int cpus = dl_bw_cpus(cpu);
+ bool overflow;
+
+ raw_spin_lock_irqsave(&dl_b->lock, flags);
+ overflow = __dl_overflow(dl_b, cpus-1, 0, 0);
+ raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+
+ if (overflow)
+ return notifier_from_errno(-EBUSY);
+ }
+ break;
+ }
+
switch (action & ~CPU_TASKS_FROZEN) {

case CPU_UP_PREPARE:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/