Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

From: Jiang Biao
Date: Tue Sep 01 2020 - 06:15:14 EST


Hi, Vincent

Sorry for the late reply.:)

On Fri, 28 Aug 2020 at 20:55, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Sun, 23 Aug 2020 at 09:33, Jiang Biao <benbjiang@xxxxxxxxx> wrote:
> >
> > Hi, Vincent and Peter
> >
> > On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
> > <vincent.guittot@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 20 Aug 2020 at 15:44, <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > > That's been said, not compensating the vruntime for a sched_idle task
> > > > > makes sense for me. Even if that will only help for others task in the
> > > > > same cfs_rq
> > > >
> > > > Yeah, but it is worth the extra pointer chasing and branches?
> > >
> > > For that I let Jiang provides figures to show the worthful
> > Using the following configuration for rt-app,
> > {
> > "tasks" : {
> > "task_other" : {
> > "instance" : 1, //only 1 instance to be easy to observe
> > "cpus" : [2],
> > "loop" : 2000,
> > "policy" : "SCHED_OTHER",
> > "run" : -1, //make normal task 100% running
> > "priority" : 0,
> > "sleep" : 0
> > },
> > "task_idle" : {
> > "instance" : 1,
> > "cpus" : [2],
> > "loop" : 2000,
> > "policy" : "SCHED_IDLE",
> > "run" : 1, //only run 1us to avoid
> > blocking(always waiting for running), making check_preempt_wakeup
> > work(S->R switching)
> > "timer" : { "ref" : "unique2" , "period" :
> > 16000, "mode" : "absolute" }
> > }
> > },
> > "global" : {
> > "calibration" : "CPU0",
> > "default_policy" : "SCHED_OTHER",
> > "duration" : -1
> > }
> > }
> > without the patch,
> > <...>-39771 [002] d.h. 42478.177771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.190437: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > <...>-39771 [002] d.h. 42478.193771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.206438: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > <...>-39771 [002] d.h. 42478.209771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.222438: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > <...>-39771 [002] d.h. 42478.225771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.238438: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > *task_idle* preempts every 12ms because of the compensation.
> >
> > with the patch,
> > task_other-0-27670 [002] d.h. 136785.278059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.293623: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > task_other-0-27670 [002] d.h. 136785.294059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.317624: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > task_other-0-27670 [002] d.h. 136785.326059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.341622: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > task_other-0-27670 [002] d.h. 136785.342059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.365623: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > *task_idle* preempts every 24 or 16 ms.
> >
> > This patch could reduce the preempting frequency of task_idle, and
> > reduce the interference from SCHED_IDLE task.
>
> For this use case, the preemption is only 1us long. Is it a realistic
> problem use case ? your normal threads might be more impacted by tick,
Nop.
1us is just to make the logic in place_entity() work. If the preemption is
longer, the IDLE task could not finish its work before being preempted back
by normal task, and the IDLE task would be always in RUNNING status from
then on. In that case place_entity() would never be reached because of the
RUNNING status.

> interrupt, timer and others things than this 1us idle thread every
> 16ms. I mean, the patch moves the impact from 1us every 12ms (0.01%)
> to 1us every 24ms (0.005%). Then, If the idle thread starts to run a
> bit longer, the period before preempting the normal thread quickly
> increases
Exactly.

>
> What is the improvement for an idle thread trying to run 1ms every
> 16ms as an example ?
If to run 1ms, the IDLE task would be always RUNNING status as said
above. In that case, place_entity() would not work, and the preemption
would happen every 340ms as always.

Thx.
Regards,
Jiang