Re: [PATCH 1/2] sched: Interrupt Aware Scheduler

From: Rohit Jain
Date: Thu May 18 2017 - 13:22:47 EST


On 05/17/2017 12:52 AM, Vincent Guittot wrote:
On 12 May 2017 at 22:19, Rohit Jain wrote:
On 05/12/2017 12:46 PM, Peter Zijlstra wrote:
On Fri, May 12, 2017 at 11:04:26AM -0700, Rohit Jain wrote:
The patch avoids CPUs which might be considered interrupt-heavy when
trying to schedule threads (on the push side) in the system. Interrupt
Awareness has only been added into the fair scheduling class.

It does so by, using the following algorithm:

--------------------------------------------------------------------------
1) When the interrupt is getting processed, the start and the end times
are noted for the interrupt on a per-cpu basis.
IRQ_TIME_ACCOUNTING you mean?

Yes. Exactly

2) On a periodic basis the interrupt load is processed for each run
queue and this is mapped in terms of percentage in a global array. The
interrupt load for a given CPU is also decayed over time, so that the
most recent interrupt load has the biggest contribution in the interrupt
load calculations. This would mean the scheduler will try to avoid CPUs
(if it can) when scheduling threads which have been recently busy with
handling hardware interrupts.
You mean like like how its already added to rt_avg? Which is then used
to lower a CPU's capacity.

Right. The only difference I see is that it is not being used on the
enqueue side as of now.

3) Any CPU which lies above the 80th percentile in terms of percentage
interrupt load is considered interrupt-heavy.

4) During idle CPU search from the scheduler perspective this
information is used to skip CPUs if better are available.

5) If none of the CPUs are better in terms of idleness and interrupt
load, then the interrupt-heavy CPU is considered to be the best
available CPU.
I would much rather you work with the EAS people and extend the capacity
awareness of those code paths. Then, per the existing logic, things
should just work out.

Did you mean we should use the capacity as a metric on the enqueue side
and not introduce a new metric?
If fact, the capacity is already taken into account in the wake up
path. you can look at wake_affine(), wake_cap() and
capacity_spare_wake()
The current implementations takes care of original capacity but it
might be extended to take into account capacity stolen by irq/rt as
well

Thanks, I have a new prototype to account for the stolen capacity, I
will send it out once I have more test results.

It doesn't matter how the capacity is lowered, at some point you just
don't want to put tasks on. It really doesn't matter if that's because
IRQs, SoftIRQs, (higher priority) Real-Time tasks, thermal throttling or
anything else.