Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

From: Vincent Guittot
Date: Fri Sep 07 2018 - 03:59:05 EST


On Fri, 7 Sep 2018 at 09:16, Juri Lelli <juri.lelli@xxxxxxxxx> wrote:
>
> On 06/09/18 16:25, Dietmar Eggemann wrote:
> > Hi Juri,
> >
> > On 08/23/2018 11:54 PM, Juri Lelli wrote:
> > > On 23/08/18 18:52, Dietmar Eggemann wrote:
> > > > Hi,
> > > >
> > > > On 08/21/2018 01:54 AM, Miguel de Dios wrote:
> > > > > On 08/17/2018 11:27 AM, Steve Muckle wrote:
> > > > > > From: John Dias <joaodias@xxxxxxxxxx>
> >
> > [...]
> >
> > > >
> > > > I tried to catch this issue on my Arm64 Juno board using pi_test (and a
> > > > slightly adapted pip_test (usleep_val = 1500 and keep low as cfs)) from
> > > > rt-tests but wasn't able to do so.
> > > >
> > > > # pi_stress --inversions=1 --duration=1 --groups=1 --sched id=low,policy=cfs
> > > >
> > > > Starting PI Stress Test
> > > > Number of thread groups: 1
> > > > Duration of test run: 1 seconds
> > > > Number of inversions per group: 1
> > > > Admin thread SCHED_FIFO priority 4
> > > > 1 groups of 3 threads will be created
> > > > High thread SCHED_FIFO priority 3
> > > > Med thread SCHED_FIFO priority 2
> > > > Low thread SCHED_OTHER nice 0
> > > >
> > > > # ./pip_stress
> > > >
> > > > In both cases, the cfs task entering rt_mutex_setprio() is queued, so
> > > > dequeue_task_fair()->dequeue_entity(), which subtracts cfs_rq->min_vruntime
> > > > from se->vruntime, is called on it before it gets the rt prio.
> > > >
> > > > Maybe it requires a very specific use of the pthread library to provoke this
> > > > issue by making sure that the cfs tasks really blocks/sleeps?
> > >
> > > Maybe one could play with rt-app to recreate such specific use case?
> > >
> > > https://github.com/scheduler-tools/rt-app/blob/master/doc/tutorial.txt#L459
> >
> > I played a little bit with rt-app on hikey960 to re-create Steve's test
> > program.
>
> Oh, nice! Thanks for sharing what you have got.
>
> > Since there is no semaphore support (sem_wait(), sem_post()) I used
> > condition variables (wait: pthread_cond_wait() , signal:
> > pthread_cond_signal()). It's not really the same since this is stateless but
> > sleeps before the signals help to maintain the state in this easy example.
> >
> > This provokes the vruntime issue e.g. for cpus 0,4 and it doesn't for 0,1:
> >
> >
> > "global": {
> > "calibration" : 130,
> > "pi_enabled" : true
> > },
> > "tasks": {
> > "rt_task": {
> > "loop" : 100,
> > "policy" : "SCHED_FIFO",
> > "cpus" : [0],
> >
> > "lock" : "b_mutex",
> > "wait" : { "ref" : "b_cond", "mutex" : "b_mutex" },
> > "unlock" : "b_mutex",
> > "sleep" : 3000,
> > "lock1" : "a_mutex",
> > "signal" : "a_cond",
> > "unlock1" : "a_mutex",
> > "lock2" : "pi-mutex",
> > "unlock2" : "pi-mutex"
> > },
> > "cfs_task": {
> > "loop" : 100,
> > "policy" : "SCHED_OTHER",
> > "cpus" : [4],
> >
> > "lock" : "pi-mutex",
> > "sleep" : 3000,
> > "lock1" : "b_mutex",
> > "signal" : "b_cond",
> > "unlock" : "b_mutex",
> > "lock2" : "a_mutex",
> > "wait" : { "ref" : "a_cond", "mutex" : "a_mutex" },
> > "unlock1" : "a_mutex",
> > "unlock2" : "pi-mutex"
> > }
> > }
> > }
> >
> > Adding semaphores is possible but rt-app has no easy way to initialize
> > individual objects, e.g. sem_init(..., value). The only way I see is via the
> > global section, like "pi_enabled". But then, this is true for all objects of
> > this kind (in this case mutexes)?
>
> Right, global section should work fine. Why do you think this is a
> problem/limitation?

keep in mind that rt-app still have "ressources" section. This one is
optional and almost never used as resources can be created on the fly
but it's still there and can be used to initialize resources if needed
like semaphore

>
> > So the following couple of lines extension to rt-app works because both
> > semaphores can be initialized to 0:
> >
> > {
> > "global": {
> > "calibration" : 130,
> > "pi_enabled" : true
> > },
> > "tasks": {
> > "rt_task": {
> > "loop" : 100,
> > "policy" : "SCHED_FIFO",
> > "cpus" : [0],
> >
> > "sem_wait" : "b_sem",
> > "sleep" : 1000,
> > "sem_post" : "a_sem",
> >
> > "lock" : "pi-mutex",
> > "unlock" : "pi-mutex"
> > },
> > "cfs_task": {
> > "loop" : 100,
> > "policy" : "SCHED_OTHER",
> > "cpus" : [4],
> >
> > "lock" : "pi-mutex",
> > "sleep" : 1000,
> > "sem_post" : "b_sem",
> > "sem_wait" : "a_sem",
> > "unlock" : "pi-mutex"
> > }
> > }
> > }
> >
> > Any thoughts on that? I can see something like this as infrastructure to
> > create a regression test case based on rt-app and standard ftrace.
>
> Agree. I guess we should add your first example to the repo (you'd be
> very welcome to create a PR) already and then work to support the second?