Re: [RFC][PATCH 15/16] sched: Trivial forced-newidle balancer

From: Valentin Schneider
Date: Thu Feb 21 2019 - 11:19:53 EST


Hi,

On 18/02/2019 16:56, Peter Zijlstra wrote:
[...]
> +static bool try_steal_cookie(int this, int that)
> +{
> + struct rq *dst = cpu_rq(this), *src = cpu_rq(that);
> + struct task_struct *p;
> + unsigned long cookie;
> + bool success = false;
> +
> + local_irq_disable();
> + double_rq_lock(dst, src);
> +
> + cookie = dst->core->core_cookie;
> + if (!cookie)
> + goto unlock;
> +
> + if (dst->curr != dst->idle)
> + goto unlock;
> +
> + p = sched_core_find(src, cookie);
> + if (p == src->idle)
> + goto unlock;
> +
> + do {
> + if (p == src->core_pick || p == src->curr)
> + goto next;
> +
> + if (!cpumask_test_cpu(this, &p->cpus_allowed))
> + goto next;
> +
> + if (p->core_occupation > dst->idle->core_occupation)
> + goto next;
> +

IIUC, we're trying to find/steal tasks matching the core_cookie from other
rqs because dst has been cookie-forced-idle.

If the p we find isn't running, what's the meaning of core_occupation?
I would have expected it to be 0, but we don't seem to be clearing it when
resetting the state in pick_next_task().

If it is running, we prevent the stealing if the core it's on is running
more matching tasks than the core of the pulling rq. It feels to me as if
that's a balancing tweak to try to cram as many matching tasks as possible
in a single core, so to me this reads as "don't steal my tasks if I'm
running more than you are, but I will steal tasks from you if I'm given
the chance". Is that correct?

[...]