Re: [RFC PATCH 09/13] sched/fair: core wide vruntime comparison

From: Peter Zijlstra
Date: Tue Apr 14 2020 - 09:57:11 EST


On Wed, Mar 04, 2020 at 04:59:59PM +0000, vpillai wrote:
> From: Aaron Lu <aaron.lu@xxxxxxxxxxxxxxxxx>
>
> This patch provides a vruntime based way to compare two cfs task's
> priority, be it on the same cpu or different threads of the same core.
>
> When the two tasks are on the same CPU, we just need to find a common
> cfs_rq both sched_entities are on and then do the comparison.
>
> When the two tasks are on differen threads of the same core, the root
> level sched_entities to which the two tasks belong will be used to do
> the comparison.
>
> An ugly illustration for the cross CPU case:
>
> cpu0 cpu1
> / | \ / | \
> se1 se2 se3 se4 se5 se6
> / \ / \
> se21 se22 se61 se62
>
> Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while
> task B's se is se61. To compare priority of task A and B, we compare
> priority of se2 and se6. Whose vruntime is smaller, who wins.
>
> To make this work, the root level se should have a common cfs_rq min
> vuntime, which I call it the core cfs_rq min vruntime.
>
> When we adjust the min_vruntime of rq->core, we need to propgate
> that down the tree so as to not cause starvation of existing tasks
> based on previous vruntime.

You forgot the time complexity analysis.


> +static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta)
> +{
> + struct sched_entity *se, *next;
> +
> + if (!cfs_rq)
> + return;
> +
> + cfs_rq->min_vruntime -= delta;
> + rbtree_postorder_for_each_entry_safe(se, next,
> + &cfs_rq->tasks_timeline.rb_root, run_node) {

Which per this ^

> + if (se->vruntime > delta)
> + se->vruntime -= delta;
> + if (se->my_q)
> + coresched_adjust_vruntime(se->my_q, delta);
> + }
> +}

> @@ -511,6 +607,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
>
> /* ensure we never gain time by being placed backwards. */
> cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
> + update_core_cfs_rq_min_vruntime(cfs_rq);
> #ifndef CONFIG_64BIT
> smp_wmb();
> cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

as called from here, is exceedingly important.

Worse, I don't think our post-order iteration is even O(n).


All of this is exceedingly yuck.