Re: [PATCH 1/2] sched: Interrupt Aware Scheduler
From: Vincent Guittot
Date: Wed May 17 2017 - 03:53:17 EST
On 12 May 2017 at 22:19, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
> On 05/12/2017 12:46 PM, Peter Zijlstra wrote:
>>
>> On Fri, May 12, 2017 at 11:04:26AM -0700, Rohit Jain wrote:
>>>
>>> The patch avoids CPUs which might be considered interrupt-heavy when
>>> trying to schedule threads (on the push side) in the system. Interrupt
>>> Awareness has only been added into the fair scheduling class.
>>>
>>> It does so by, using the following algorithm:
>>>
>>> --------------------------------------------------------------------------
>>> 1) When the interrupt is getting processed, the start and the end times
>>> are noted for the interrupt on a per-cpu basis.
>>
>> IRQ_TIME_ACCOUNTING you mean?
>
>
> Yes. Exactly
>
>>> 2) On a periodic basis the interrupt load is processed for each run
>>> queue and this is mapped in terms of percentage in a global array. The
>>> interrupt load for a given CPU is also decayed over time, so that the
>>> most recent interrupt load has the biggest contribution in the interrupt
>>> load calculations. This would mean the scheduler will try to avoid CPUs
>>> (if it can) when scheduling threads which have been recently busy with
>>> handling hardware interrupts.
>>
>> You mean like like how its already added to rt_avg? Which is then used
>> to lower a CPU's capacity.
>
>
> Right. The only difference I see is that it is not being used on the
> enqueue side as of now.
>
>>> 3) Any CPU which lies above the 80th percentile in terms of percentage
>>> interrupt load is considered interrupt-heavy.
>>>
>>> 4) During idle CPU search from the scheduler perspective this
>>> information is used to skip CPUs if better are available.
>>>
>>> 5) If none of the CPUs are better in terms of idleness and interrupt
>>> load, then the interrupt-heavy CPU is considered to be the best
>>> available CPU.
>>
>> I would much rather you work with the EAS people and extend the capacity
>> awareness of those code paths. Then, per the existing logic, things
>> should just work out.
>
>
> Did you mean we should use the capacity as a metric on the enqueue side
> and not introduce a new metric?
If fact, the capacity is already taken into account in the wake up
path. you can look at wake_affine(), wake_cap() and
capacity_spare_wake()
The current implementations takes care of original capacity but it
might be extended to take into account capacity stolen by irq/rt as
well
>
>
>>
>> It doesn't matter how the capacity is lowered, at some point you just
>> don't want to put tasks on. It really doesn't matter if that's because
>> IRQs, SoftIRQs, (higher priority) Real-Time tasks, thermal throttling or
>> anything else.