[GIT PULL] scheduler fixes

From: Ingo Molnar
Date: Fri Feb 06 2015 - 13:31:21 EST

Next message: Jake Oshins: "RE: [PATCH v2 1/1] drivers:hv:vmbus drivers:hv:vmbus Allow for more than one MMIO range for children"
Previous message: Robin Murphy: "Re: [PATCH/RFC 4/4] iommu/arm-smmu: Support deferred probing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Linus,

Please pull the latest sched-urgent-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-urgent-for-linus

# HEAD: 40767b0dc768060266d261b4a330164b4be53f7c sched/deadline: Fix deadline parameter modification handling

Misc fixes.

Thanks,

Ingo

------------------>
Jan Beulich (1):
sched/fair: Avoid using uninitialized variable in preferred_group_nid()

Mike Galbraith (1):
sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask

Mikulas Patocka (1):
sched/wait: Remove might_sleep() from wait_event_cmd()

Peter Zijlstra (1):
sched/deadline: Fix deadline parameter modification handling

include/linux/wait.h | 1 -
kernel/sched/core.c | 36 +++++++++++++++++++++++++++++++-----
kernel/sched/deadline.c | 3 ++-
kernel/sched/fair.c | 2 +-
4 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 2232ed16635a..37423e0e1379 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -363,7 +363,6 @@ do { \
*/
#define wait_event_cmd(wq, condition, cmd1, cmd2) \
do { \
- might_sleep(); \
if (condition) \
break; \
__wait_event_cmd(wq, condition, cmd1, cmd2); \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0accc00566e..9e838095beb8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1814,6 +1814,10 @@ void __dl_clear_params(struct task_struct *p)
dl_se->dl_period = 0;
dl_se->flags = 0;
dl_se->dl_bw = 0;
+
+ dl_se->dl_throttled = 0;
+ dl_se->dl_new = 1;
+ dl_se->dl_yielded = 0;
}

/*
@@ -1839,7 +1843,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
#endif

RB_CLEAR_NODE(&p->dl.rb_node);
- hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ init_dl_task_timer(&p->dl);
__dl_clear_params(p);

INIT_LIST_HEAD(&p->rt.run_list);
@@ -2049,6 +2053,9 @@ static inline int dl_bw_cpus(int i)
* allocated bandwidth to reflect the new situation.
*
* This function is called while holding p's rq->lock.
+ *
+ * XXX we should delay bw change until the task's 0-lag point, see
+ * __setparam_dl().
*/
static int dl_overflow(struct task_struct *p, int policy,
const struct sched_attr *attr)
@@ -3251,15 +3258,31 @@ __setparam_dl(struct task_struct *p, const struct sched_attr *attr)
{
struct sched_dl_entity *dl_se = &p->dl;

- init_dl_task_timer(dl_se);
dl_se->dl_runtime = attr->sched_runtime;
dl_se->dl_deadline = attr->sched_deadline;
dl_se->dl_period = attr->sched_period ?: dl_se->dl_deadline;
dl_se->flags = attr->sched_flags;
dl_se->dl_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
- dl_se->dl_throttled = 0;
- dl_se->dl_new = 1;
- dl_se->dl_yielded = 0;
+
+ /*
+ * Changing the parameters of a task is 'tricky' and we're not doing
+ * the correct thing -- also see task_dead_dl() and switched_from_dl().
+ *
+ * What we SHOULD do is delay the bandwidth release until the 0-lag
+ * point. This would include retaining the task_struct until that time
+ * and change dl_overflow() to not immediately decrement the current
+ * amount.
+ *
+ * Instead we retain the current runtime/deadline and let the new
+ * parameters take effect after the current reservation period lapses.
+ * This is safe (albeit pessimistic) because the 0-lag point is always
+ * before the current scheduling deadline.
+ *
+ * We can still have temporary overloads because we do not delay the
+ * change in bandwidth until that time; so admission control is
+ * not on the safe side. It does however guarantee tasks will never
+ * consume more than promised.
+ */
}

/*
@@ -4642,6 +4665,9 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
struct dl_bw *cur_dl_b;
unsigned long flags;

+ if (!cpumask_weight(cur))
+ return ret;
+
rcu_read_lock_sched();
cur_dl_b = dl_bw_of(cpumask_any(cur));
trial_cpus = cpumask_weight(trial);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b52092f2636d..726470d47f87 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1094,6 +1094,7 @@ static void task_dead_dl(struct task_struct *p)
* Since we are TASK_DEAD we won't slip out of the domain!
*/
raw_spin_lock_irq(&dl_b->lock);
+ /* XXX we should retain the bw until 0-lag */
dl_b->total_bw -= p->dl.dl_bw;
raw_spin_unlock_irq(&dl_b->lock);

@@ -1614,8 +1615,8 @@ static void cancel_dl_timer(struct rq *rq, struct task_struct *p)

static void switched_from_dl(struct rq *rq, struct task_struct *p)
{
+ /* XXX we should retain the bw until 0-lag */
cancel_dl_timer(rq, p);
-
__dl_clear_params(p);

/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40667cbf371b..fe331fc391f5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1730,7 +1730,7 @@ static int preferred_group_nid(struct task_struct *p, int nid)
nodes = node_online_map;
for (dist = sched_max_numa_distance; dist > LOCAL_DISTANCE; dist--) {
unsigned long max_faults = 0;
- nodemask_t max_group;
+ nodemask_t max_group = NODE_MASK_NONE;
int a, b;

/* Are there nodes at this distance from each other? */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jake Oshins: "RE: [PATCH v2 1/1] drivers:hv:vmbus drivers:hv:vmbus Allow for more than one MMIO range for children"
Previous message: Robin Murphy: "Re: [PATCH/RFC 4/4] iommu/arm-smmu: Support deferred probing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]