Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

From: Dietmar Eggemann
Date: Thu Sep 06 2018 - 19:25:12 EST


Hi Juri,

On 08/23/2018 11:54 PM, Juri Lelli wrote:
On 23/08/18 18:52, Dietmar Eggemann wrote:
Hi,

On 08/21/2018 01:54 AM, Miguel de Dios wrote:
On 08/17/2018 11:27 AM, Steve Muckle wrote:
From: John Dias <joaodias@xxxxxxxxxx>

[...]


I tried to catch this issue on my Arm64 Juno board using pi_test (and a
slightly adapted pip_test (usleep_val = 1500 and keep low as cfs)) from
rt-tests but wasn't able to do so.

# pi_stress --inversions=1 --duration=1 --groups=1 --sched id=low,policy=cfs

Starting PI Stress Test
Number of thread groups: 1
Duration of test run: 1 seconds
Number of inversions per group: 1
Admin thread SCHED_FIFO priority 4
1 groups of 3 threads will be created
High thread SCHED_FIFO priority 3
Med thread SCHED_FIFO priority 2
Low thread SCHED_OTHER nice 0

# ./pip_stress

In both cases, the cfs task entering rt_mutex_setprio() is queued, so
dequeue_task_fair()->dequeue_entity(), which subtracts cfs_rq->min_vruntime
from se->vruntime, is called on it before it gets the rt prio.

Maybe it requires a very specific use of the pthread library to provoke this
issue by making sure that the cfs tasks really blocks/sleeps?

Maybe one could play with rt-app to recreate such specific use case?

https://github.com/scheduler-tools/rt-app/blob/master/doc/tutorial.txt#L459

I played a little bit with rt-app on hikey960 to re-create Steve's test program.
Since there is no semaphore support (sem_wait(), sem_post()) I used condition variables (wait: pthread_cond_wait() , signal: pthread_cond_signal()). It's not really the same since this is stateless but sleeps before the signals help to maintain the state in this easy example.

This provokes the vruntime issue e.g. for cpus 0,4 and it doesn't for 0,1:


"global": {
"calibration" : 130,
"pi_enabled" : true
},
"tasks": {
"rt_task": {
"loop" : 100,
"policy" : "SCHED_FIFO",
"cpus" : [0],

"lock" : "b_mutex",
"wait" : { "ref" : "b_cond", "mutex" : "b_mutex" },
"unlock" : "b_mutex",
"sleep" : 3000,
"lock1" : "a_mutex",
"signal" : "a_cond",
"unlock1" : "a_mutex",
"lock2" : "pi-mutex",
"unlock2" : "pi-mutex"
},
"cfs_task": {
"loop" : 100,
"policy" : "SCHED_OTHER",
"cpus" : [4],

"lock" : "pi-mutex",
"sleep" : 3000,
"lock1" : "b_mutex",
"signal" : "b_cond",
"unlock" : "b_mutex",
"lock2" : "a_mutex",
"wait" : { "ref" : "a_cond", "mutex" : "a_mutex" },
"unlock1" : "a_mutex",
"unlock2" : "pi-mutex"
}
}
}

Adding semaphores is possible but rt-app has no easy way to initialize individual objects, e.g. sem_init(..., value). The only way I see is via the global section, like "pi_enabled". But then, this is true for all objects of this kind (in this case mutexes)?

So the following couple of lines extension to rt-app works because both semaphores can be initialized to 0:

{
"global": {
"calibration" : 130,
"pi_enabled" : true
},
"tasks": {
"rt_task": {
"loop" : 100,
"policy" : "SCHED_FIFO",
"cpus" : [0],

"sem_wait" : "b_sem",
"sleep" : 1000,
"sem_post" : "a_sem",

"lock" : "pi-mutex",
"unlock" : "pi-mutex"
},
"cfs_task": {
"loop" : 100,
"policy" : "SCHED_OTHER",
"cpus" : [4],

"lock" : "pi-mutex",
"sleep" : 1000,
"sem_post" : "b_sem",
"sem_wait" : "a_sem",
"unlock" : "pi-mutex"
}
}
}

Any thoughts on that? I can see something like this as infrastructure to create a regression test case based on rt-app and standard ftrace.

[...]