Re: [External] Re: Subject: [PATCH] sched/fair: prioritize normal task over sched_idle task with vruntime offset

From: chenying
Date: Sun Mar 13 2022 - 06:10:33 EST


在 2022/3/13 17:02, Peter Zijlstra 写道:
On Sun, Mar 13, 2022 at 01:37:37PM +0800, chenying wrote:
在 2022/3/12 20:03, Peter Zijlstra 写道:
On Fri, Mar 11, 2022 at 03:58:47PM +0800, chenying wrote:
We add a time offset to the se->vruntime when the idle sched_entity
is enqueued, so that the idle entity will always be on the right of
the non-idle in the runqueue. This can allow non-idle tasks to be
selected and run before the idle.

A use-case is that sched_idle for background tasks and non-idle
for foreground. The foreground tasks are latency sensitive and do
not want to be disturbed by the background. It is well known that
the idle tasks can be preempted by the non-idle tasks when waking up,
but will not distinguish between idle and non-idle when pick the next
entity. This may cause background tasks to disturb the foreground.

Test results as below:

~$ ./loop.sh &
[1] 764
~$ chrt -i 0 ./loop.sh &
[2] 765
~$ taskset -p 04 764
~$ taskset -p 04 765

~$ top -p 764 -p 765
top - 13:10:01 up 1 min,  2 users,  load average: 1.30, 0.38, 0.13
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  0.0 sy,  0.0 ni, 87.4 id,  0.0 wa,  0.0 hi, 0.0 si,  0.0
st
KiB Mem : 16393492 total, 16142256 free,   111028 used,   140208 buff/cache
KiB Swap:   385836 total,   385836 free,        0 used. 16037992 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
  764 chenyin+  20   0   12888   1144   1004 R 100.0  0.0 1:05.12 loop.sh
  765 chenyin+  20   0   12888   1224   1080 R   0.0  0.0 0:16.21 loop.sh

The non-idle process (764) can run at 100% and without being disturbed by
the idle process (765).

Did you just do a very complicated true idle time scheduler, with all
the problems that brings?

When colocating CPU-intensive jobs with latency-sensitive services can
improve CPU utilization but it is difficult to meet the stringent
tail-latency requirements of latency-sensitive services. We use a true idle
time scheduler for CPU-intensive jobs to minimize the impact on
latency-sensitive services.

Hard NAK on any true idle-time scheduler until you make the whole kernel
immune to lock holder starvation issues.

If I set the sched_idle_vruntime_offset to a relatively small value (e.g. 10 minutes), can this issues be avoided?