Re: [CFT][RFC] HT scheduler

From: Nick Piggin
Date: Thu Dec 18 2003 - 23:58:34 EST




Nick Piggin wrote:



Rusty Russell wrote:


A few things need work:

1) There's a race between sys_sched_setaffinity() and
sched_migrate_task() (this is nothing to do with your patch).


Yep. They should both take the task's runqueue lock.


Easier said than done... anyway, how does this patch look?
It should also cure a possible and not entirely unlikely use
after free of the task_struct in sched_migrate_task on NUMA
AFAIKS.

Patch is on top of a few other changes so might not apply, just
for review. I'll release a new rollup with this included soon.




2) Please change those #defines into an enum for idle (patch follows,
untested but trivial)


Thanks, I'll take the patch.


done



3) conditional locking in load_balance is v. bad idea.


Yeah... I'm thinking about this. I don't think it should be too hard
to break out the shared portion.


done




4) load_balance returns "(!failed && !balanced)", but callers stop
calling it when it returns true. Why not simply return "balanced",
or at least "balanced && !failed"?



No, the idle balancer stops calling it when it returns true, the periodic
balancer sets idle to 0 when it returns true.

!balanced && !failed means it has moved a task.

I'll either comment that, or return it in a more direct way.


done



Prevents a race where sys_sched_setaffinity can race with sched_migrate_task
and cause sched_migrate_task to restore an invalid cpu mask.


linux-2.6-npiggin/kernel/sched.c | 89 +++++++++++++++++++++++++++++----------
1 files changed, 68 insertions(+), 21 deletions(-)

diff -puN kernel/sched.c~sched-migrate-affinity-race kernel/sched.c
--- linux-2.6/kernel/sched.c~sched-migrate-affinity-race 2003-12-19 14:45:58.000000000 +1100
+++ linux-2.6-npiggin/kernel/sched.c 2003-12-19 15:19:30.000000000 +1100
@@ -947,6 +947,9 @@ static inline void double_rq_unlock(runq
}

#ifdef CONFIG_NUMA
+
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags);
+
/*
* If dest_cpu is allowed for this process, migrate the task to it.
* This is accomplished by forcing the cpu_allowed mask to only
@@ -955,16 +958,43 @@ static inline void double_rq_unlock(runq
*/
static void sched_migrate_task(task_t *p, int dest_cpu)
{
- cpumask_t old_mask;
+ runqueue_t *rq;
+ unsigned long flags;
+ cpumask_t old_mask, new_mask = cpumask_of_cpu(dest_cpu);

+ rq = task_rq_lock(p, &flags);
old_mask = p->cpus_allowed;
- if (!cpu_isset(dest_cpu, old_mask))
+ if (!cpu_isset(dest_cpu, old_mask)) {
+ task_rq_unlock(rq, &flags);
return;
+ }
+
+ get_task_struct(p);
+
/* force the process onto the specified CPU */
- set_cpus_allowed(p, cpumask_of_cpu(dest_cpu));
+ if (__set_cpus_allowed(p, new_mask, &flags) < 0)
+ goto out;
+
+ /* __set_cpus_allowed unlocks the runqueue */
+ rq = task_rq_lock(p, &flags);
+ if (unlikely(p->cpus_allowed != new_mask)) {
+ /*
+ * We have raced with another set_cpus_allowed.
+ * old_mask is invalid and we needn't move the
+ * task back.
+ */
+ task_rq_unlock(rq, &flags);
+ goto out;
+ }
+
+ /*
+ * restore the cpus allowed mask. old_mask must be valid because
+ * p->cpus_allowed is a subset of old_mask.
+ */
+ __set_cpus_allowed(p, old_mask, &flags);

- /* restore the cpus allowed mask */
- set_cpus_allowed(p, old_mask);
+out:
+ put_task_struct(p);
}

/*
@@ -2603,31 +2633,27 @@ typedef struct {
} migration_req_t;

/*
- * Change a given task's CPU affinity. Migrate the thread to a
- * proper CPU and schedule it away if the CPU it's executing on
- * is removed from the allowed bitmask.
- *
- * NOTE: the caller must have a valid reference to the task, the
- * task must not exit() & deallocate itself prematurely. The
- * call is not atomic; no spinlocks may be held.
+ * See comment for set_cpus_allowed. calling rules are different:
+ * the task's runqueue lock must be held, and __set_cpus_allowed
+ * will return with the runqueue unlocked.
*/
-int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+static inline int __set_cpus_allowed(task_t *p, cpumask_t new_mask, unsigned long *flags)
{
- unsigned long flags;
migration_req_t req;
- runqueue_t *rq;
+ runqueue_t *rq = task_rq(p);

- if (any_online_cpu(new_mask) == NR_CPUS)
+ if (any_online_cpu(new_mask) == NR_CPUS) {
+ task_rq_unlock(rq, flags);
return -EINVAL;
+ }

- rq = task_rq_lock(p, &flags);
p->cpus_allowed = new_mask;
/*
* Can the task run on the task's current CPU? If not then
* migrate the thread off to a proper CPU.
*/
if (cpu_isset(task_cpu(p), new_mask)) {
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, flags);
return 0;
}
/*
@@ -2636,18 +2662,39 @@ int set_cpus_allowed(task_t *p, cpumask_
*/
if (!p->array && !task_running(rq, p)) {
set_task_cpu(p, any_online_cpu(p->cpus_allowed));
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, flags);
return 0;
}
+
init_completion(&req.done);
req.task = p;
list_add(&req.list, &rq->migration_queue);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, flags);

wake_up_process(rq->migration_thread);
-
wait_for_completion(&req.done);
+
return 0;
+
+}
+
+/*
+ * Change a given task's CPU affinity. Migrate the thread to a
+ * proper CPU and schedule it away if the CPU it's executing on
+ * is removed from the allowed bitmask.
+ *
+ * NOTE: the caller must have a valid reference to the task, the
+ * task must not exit() & deallocate itself prematurely. The
+ * call is not atomic; no spinlocks may be held.
+ */
+int set_cpus_allowed(task_t *p, cpumask_t new_mask)
+{
+ unsigned long flags;
+ runqueue_t *rq;
+
+ rq = task_rq_lock(p, &flags);
+
+ return __set_cpus_allowed(p, new_mask, &flags);
}

EXPORT_SYMBOL_GPL(set_cpus_allowed);

_