RE: sched/fair: scheduler not running high priority process on idle cpu

From: David Laight
Date: Tue Jan 14 2020 - 12:34:01 EST


From: Steven Rostedt
> Sent: 14 January 2020 16:59
>
> On Tue, 14 Jan 2020 16:50:43 +0000
> David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> > I've a test that uses four RT priority processes to process audio data every 10ms.
> > One process wakes up the other three, they all 'beaver away' clearing a queue of
> > jobs and the last one to finish sleeps until the next tick.
> > Usually this takes about 0.5ms, but sometimes takes over 3ms.
> >
> > AFAICT the processes are normally woken on the same cpu they last ran on.
> > There seems to be a problem when the selected cpu is running a (low priority)
> > process that is looping in kernel [1].
> > I'd expect my process to be picked up by one of the idle cpus, but this
> > doesn't happen.
> > Instead the process sits in state 'waiting' until the active processes sleeps
> > (or calls cond_resched()).
> >
> > Is this really the expected behaviour?????
>
> It is with CONFIG_PREEMPT_VOLUNTARY. I think you want to recompile your
> kernel with CONFIG_PREEMPT. The idea is that the RT task will continue
> to run on the CPU it last ran on, and would push off the lower priority
> task to the idle CPU. But CONFIG_PREEMPT_VOLUNTARY means that this
> will have to wait for the running task to not be in kernel context or
> hit a cond_resched() which is the "voluntary" scheduling point.

I have added a cond_resched() to the offending loop, but a close look implies
that code is called with a lock held in another (less common) path so that
can't be directly committed and so CONFIG_PREEMPT won't help.

Indeed requiring CONFIG_PREEMPT doesn't help when customers are running
the application, nor (probably) on AWS since I doubt it is ever the default.

Does the same apply to non-RT tasks?
I can select almost any priority, but RT ones are otherwise a lot better.

I've also seen RT processes delayed by the network stack 'bh' that runs
in a softint from the hardware interrupt.
That can take a while (clearing up tx and refilling rx) and I don't think we
have any control over the cpu it runs on?

The cost of ftrace function call entry/exit (about 200 clocks) makes it
rather unsuitable for any performance measurements unless only
a very few functions are traced - which rather requires you know
what the code is doing :-(

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)