Re: [RFC PATCH 07/11] sched: Add proxy execution
From: Joel Fernandes
Date: Fri Oct 28 2022 - 23:31:29 EST
Hello Dietmar,
> On Oct 24, 2022, at 6:13 AM, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 03/10/2022 23:44, Connor O'Brien wrote:
>> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>
> [...]
>
>> + * Returns the task that is going to be used as execution context (the one
>> + * that is actually going to be put to run on cpu_of(rq)).
>> + */
>> +static struct task_struct *
>> +proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
>> +{
>
> [...]
>
>> +migrate_task:
>
> [...]
>
>> + /*
>> + * Since we're going to drop @rq, we have to put(@next) first,
>> + * otherwise we have a reference that no longer belongs to us. Use
>> + * @fake_task to fill the void and make the next pick_next_task()
> ^^^^^^^^^^
>
> There was a `static struct task_struct fake_task` in
> https://lkml.kernel.org/r/20181009092434.26221-6-juri.lelli@xxxxxxxxxx
> but now IMHO we use `rq->idle` <-- (1)
Ok.
>> + * invocation happy.
>> + *
>> + * XXX double, triple think about this.
>> + * XXX put doesn't work with ON_RQ_MIGRATE
>> + *
>> + * CPU0 CPU1
>> + *
>> + * B mutex_lock(X)
>> + *
>> + * A mutex_lock(X) <- B
>> + * A __schedule()
>> + * A pick->A
>> + * A proxy->B
>> + * A migrate A to CPU1
>> + * B mutex_unlock(X) -> A
>> + * B __schedule()
>> + * B pick->A
>> + * B switch_to (A)
>> + * A ... does stuff
>> + * A ... is still running here
>> + *
>> + * * BOOM *
>> + */
>> + put_prev_task(rq, next);
>> + if (curr_in_chain) {
>> + rq->proxy = rq->idle;
>> + set_tsk_need_resched(rq->idle);
>> + /*
>> + * XXX [juril] don't we still need to migrate @next to
>> + * @owner's CPU?
>> + */
>> + return rq->idle;
>> + }
>
> --> (1)
Sorry but what has this got to do with your comment below?
>> + rq->proxy = rq->idle;
>> +
>> + for (; p; p = p->blocked_proxy) {
>> + int wake_cpu = p->wake_cpu;
>> +
>> + WARN_ON(p == rq->curr);
>> +
>> + deactivate_task(rq, p, 0);
>> + set_task_cpu(p, that_cpu);
>> + /*
>> + * We can abuse blocked_entry to migrate the thing, because @p is
>> + * still on the rq.
>> + */
>> + list_add(&p->blocked_entry, &migrate_list);
>> +
>> + /*
>> + * Preserve p->wake_cpu, such that we can tell where it
>> + * used to run later.
>> + */
>> + p->wake_cpu = wake_cpu;
>> + }
>> +
>> + rq_unpin_lock(rq, rf);
>> + raw_spin_rq_unlock(rq);
>
> Don't we run into rq_pin_lock()'s:
>
> SCHED_WARN_ON(rq->balance_callback && rq->balance_callback !=
> &balance_push_callback)
>
> by releasing rq lock between queue_balance_callback(, push_rt/dl_tasks)
> and __balance_callbacks()?
Apologies, I’m a bit lost here. The code you are responding to inline does not call rq_pin_lock, it calls rq_unpin_lock. So what scenario does the warning trigger according to you?
Thanks,
- Joel
>
> [...]