Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison

From: Aaron Lu
Date: Fri May 08 2020 - 08:35:10 EST


On Fri, May 08, 2020 at 11:09:25AM +0200, Peter Zijlstra wrote:
> On Fri, May 08, 2020 at 04:44:19PM +0800, Aaron Lu wrote:
> > On Wed, May 06, 2020 at 04:35:06PM +0200, Peter Zijlstra wrote:
>
> > > Aside from this being way to complicated for what it does -- you
> > > could've saved the min_vruntime for each rq and compared them with
> > > subtraction -- it is also terminally broken afaict.
> > >
> > > Consider any infeasible weight scenario. Take for instance two tasks,
> > > each bound to their respective sibling, one with weight 1 and one with
> > > weight 2. Then the lower weight task will run ahead of the higher weight
> > > task without bound.
> >
> > I don't follow how this could happen. Even the lower weight task runs
> > first, after some time, the higher weight task will get its turn and
> > from then on, the higher weight task will get more chance to run(due to
> > its higher weight and thus, slower accumulation of vruntime).
>
> That seems to assume they're mutually exclusive. In that case, as I
> argued, we only have a single runqueue and then yes it works. But if
> they're not exclusive, and can run concurrently, it comes apart.

Ah right, now I see what you mean. Sorry for misunderstanding.

And yes, that 'utterly destroys the concept of a shared time base' and
then bad things can happen:
1) two same tagged tasks(t1 and t2) running on two siblings, with t1's
weight lower than t2's;
2) both tasks are cpu intensive;
3) over time, the lower weight task(t1)'s vruntime becomes bigger and
bigger than t2's vruntime and the core wide min_vruntime is the
same as t1's vruntime per this patch;
4) a new task enqueued on the same sibling as t1, if the new task has
an incompatible tag, it will be starved by t2 because t2's vruntime
is way smaller than the core wide min_vruntime.

With this said, I realized a workaround for the issue described above:
when the core went from 'compatible mode'(step 1-3) to 'incompatible
mode'(step 4), reset all root level sched entities' vruntime to be the
same as the core wide min_vruntime. After all, the core is transforming
from two runqueue mode to single runqueue mode... I think this can solve
the issue to some extent but I may miss other scenarios.

I'll also re-read your last email about the 'lag' idea.