Re: [PATCH v2] sched/fair: Do not set skip buddy up the sched hierarchy

From: Josh Don
Date: Thu Dec 12 2019 - 17:19:41 EST


On Mon, Dec 9, 2019 at 1:19 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
>
> On 06.12.19 23:13, Josh Don wrote:
>
> [...]
>
> > On Thu, Dec 5, 2019 at 11:57 PM Vincent Guittot
> > <vincent.guittot@xxxxxxxxxx> wrote:
> >>
> >> Hi Josh,
> >>
> >> On Wed, 4 Dec 2019 at 21:06, Josh Don <joshdon@xxxxxxxxxx> wrote:
> >>>
> >>> From: Venkatesh Pallipadi <venki@xxxxxxxxxx>
> >>>
> >>> Setting skip buddy all the way up the hierarchy does not play well
> >>> with intra-cgroup yield. One typical usecase of yield is when a
> >>> thread in a cgroup wants to yield CPU to another thread within the
> >>> same cgroup. For such a case, setting the skip buddy all the way up
>
> But with yield_task{_fair}() you have no way to control which other task
> gets accelerated. The other task in the taskgroup (cgroup) could be even
> on another CPU.
>
> It's not like yield_to_task_fair() which uses next buddy to accelerate
> another task p.
>
> What's this typical usecase?

The semantics for yield_task under CFS are not well-defined. With our
CFS hierarchy, we cannot easily just push a yielded task to the end of
a runqueue. And, we don't want to play games with artificially
increasing vruntime, as this results in potentially high latency for a
yielded task to get back on CPU.

I'd interpret a task that calls yield as saying "I can run, but try to
run something else." I'd agree that this patch is imperfect in
achieving this, but I think it is better than the current
implementation (or at least, less broken). Currently, a side-effect
of calling yield is that all other tasks in the same hierarchy get
skipped as well. This is almost certainly not what the user
expects/wants. It is true that if a yielded task has no other tasks
in its cgroup on the same CPU, we will potentially end up just picking
the yielded task again. But this should be OK; a yielded task should
be able to continue making forward progress. Any yielded task that
calls yield again is likely implementing a busy loop, which is an
improper use of yield anyway.

I also played around with the idea of setting the skip buddy up the
hierarchy up to the point where cfs_rq->nr_running > 1, but this is
racy with enqueue, and in general raises questions about whether an
enqueued task should try to clear skip buddies.