Re: [PATCH RESEND 1/2] sched/rt: Check to push the task when changing its affinity

From: Steven Rostedt
Date: Tue Feb 03 2015 - 22:13:52 EST


On Wed, 4 Feb 2015 09:12:20 +0800
Xunlei Pang <xlpang@xxxxxxx> wrote:

> From: Xunlei Pang <pang.xunlei@xxxxxxxxxx>
>
> We may suffer from extra rt overload rq due to the affinity,
> so when the affinity of any runnable rt task is changed, we
> should check to trigger balancing, otherwise it will cause
> some unnecessary delayed real-time response. Unfortunately,
> current RT global scheduler doesn't trigger anything.
>
> For example: a 2-cpu system with two runnable FIFO tasks(same
> rt_priority) bound on CPU0, let's name them rt1(running) and
> rt2(runnable) respectively; CPU1 has no RTs. Then, someone sets
> the affinity of rt2 to 0x3(i.e. CPU0 and CPU1), but after this,
> rt2 still can't be scheduled until rt1 enters schedule(), this
> definitely causes some/big response latency for rt2.
>

I understand the issue you point out, but I have to be honest and say
that I really do not like this approach.

> So, when doing set_cpus_allowed_rt(), if detecting such cases,
> check to trigger a push behaviour.
>
> Signed-off-by: Xunlei Pang <pang.xunlei@xxxxxxxxxx>
> ---
> kernel/sched/rt.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 59 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index f4d4b07..4dacb6e 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1428,7 +1428,7 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
> return next;
> }
>
> -static struct task_struct *_pick_next_task_rt(struct rq *rq)
> +static struct task_struct *_pick_next_task_rt(struct rq *rq, int peek_only)
> {

peek_only should be bool, but don't worry about it, I think this isn't
needed.

> struct sched_rt_entity *rt_se;
> struct task_struct *p;
> @@ -1441,7 +1441,8 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
> } while (rt_rq);
>
> p = rt_task_of(rt_se);
> - p->se.exec_start = rq_clock_task(rq);
> + if (!peek_only)
> + p->se.exec_start = rq_clock_task(rq);
>
> return p;
> }
> @@ -1476,7 +1477,7 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev)
>
> put_prev_task(rq, prev);
>
> - p = _pick_next_task_rt(rq);
> + p = _pick_next_task_rt(rq, 0);
>
> /* The running task is never eligible for pushing */
> dequeue_pushable_task(rq, p);
> @@ -1886,28 +1887,69 @@ static void set_cpus_allowed_rt(struct task_struct *p,
> const struct cpumask *new_mask)
> {
> struct rq *rq;
> - int weight;
> + int old_weight, new_weight;
> + int preempt_push = 0, direct_push = 0;
>
> BUG_ON(!rt_task(p));
>
> if (!task_on_rq_queued(p))
> return;
>
> - weight = cpumask_weight(new_mask);
> + old_weight = p->nr_cpus_allowed;
> + new_weight = cpumask_weight(new_mask);
> +
> + rq = task_rq(p);
> +
> + if (new_weight > 1 &&
> + rt_task(rq->curr) &&
> + !test_tsk_need_resched(rq->curr)) {
> + /*
> + * Set new mask information to prepare pushing.
> + * It's safe to do this here.

Please explain why it is safe.

> + */
> + cpumask_copy(&p->cpus_allowed, new_mask);
> + p->nr_cpus_allowed = new_weight;
> +
> + if (task_running(rq, p) &&
> + cpumask_test_cpu(task_cpu(p), new_mask) &&
> + cpupri_find(&rq->rd->cpupri, p, NULL)) {

Hmm, You called cpupri_find() which should also return a mask of the
CPUs with the lowest priorities. I wonder if we could have utilize this
information instead of doing it twice? Of course things could change by
the time the task migrates.

> + /*
> + * At this point, current task gets migratable most
> + * likely due to the change of its affinity, let's
> + * figure out if we can migrate it.
> + *
> + * Is there any task with the same priority as that
> + * of current task? If found one, we should resched.
> + * NOTE: The target may be unpushable.
> + */
> + if (p->prio == rq->rt.highest_prio.next) {
> + /* One target just in pushable_tasks list. */
> + requeue_task_rt(rq, p, 0);
> + preempt_push = 1;
> + } else if (rq->rt.rt_nr_total > 1) {
> + struct task_struct *next;
> +
> + requeue_task_rt(rq, p, 0);
> + /* peek only */
> + next = _pick_next_task_rt(rq, 1);
> + if (next != p && next->prio == p->prio)
> + preempt_push = 1;
> + }

I'm thinking it would be better just to send an IPI to the CPU that
figures this out and pushes a task off of itself.

> + } else if (!task_running(rq, p))
> + direct_push = 1;
> + }
>
> /*
> * Only update if the process changes its state from whether it
> * can migrate or not.
> */
> - if ((p->nr_cpus_allowed > 1) == (weight > 1))
> - return;
> -
> - rq = task_rq(p);
> + if ((old_weight > 1) == (new_weight > 1))
> + goto out;
>
> /*
> * The process used to be able to migrate OR it can now migrate
> */
> - if (weight <= 1) {
> + if (new_weight <= 1) {
> if (!task_current(rq, p))
> dequeue_pushable_task(rq, p);
> BUG_ON(!rq->rt.rt_nr_migratory);
> @@ -1919,6 +1961,13 @@ static void set_cpus_allowed_rt(struct task_struct *p,
> }
>
> update_rt_migration(&rq->rt);
> +
> +out:
> + if (direct_push)
> + push_rt_tasks(rq);
> +
> + if (preempt_push)
> + resched_curr(rq);

I don't know. This just doesn't seem clean.

-- Steve

> }
>
> /* Assumes rq->lock is held */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/