Re: [discussion]sched: a rough proposal to enable power saving in scheduler

From: Shilimkar, Santosh
Date: Thu Aug 16 2012 - 08:45:58 EST


On Thu, Aug 16, 2012 at 12:23 PM, preeti <preeti@xxxxxxxxxxxxxxxxxx> wrote:
>
> Hi everyone,
>
> From what I have understood so far,I try to summarise pin pointed
> differences between the performance and power policies as found
> relevant to the scheduler-load balancing mechanism.Any thoughts?
>
> *Performance policy*:
>
> Q1.Who triggers load_balance?
> Load balance is triggered when a cpu is found to be idle.(Pull mechanism)
>
> Q2.How is load_balance handled?
> When triggered,the load is looked to be pulled from its sched domain.
> First the sched groups in the domain the cpu belongs to is queried
> followed by the runqueues in the busiest group.then the tasks are moved.
>
> This course of action is found analogous to the performance policy because:
>
> 1.First the idle cpu initiates the pull action
> 2.The busiest cpu hands over the load to this cpu.A person who can
> handle any work is querying as to who cannot handle more work.
>
> *Power policy*:
>
> So how is power policy different? As Peter says,'pack more than spread
> more'.
>
> Q1.Who triggers load balance?
> It is the cpu which cannot handle more work.Idle cpu is left to remain
> idle.(Push mechanism)
>
> Q2.How is load_balance handled?
> First the least busy runqueue,from within the sched_group that the busy
> cpu belongs to is queried.if none exist,ie all the runqueues are equally
> busy then move on to the other sched groups.
>
> Here again the 'least busy' policy should be applied,first at
> group level then at the runqueue level.
>
> This course of action is found analogous to the power policy because as
> much as possible busy and capable cpus within a small range try to
> handle the existing load.
>
Not to complicate the power policy scheme but always *packing* may
not be the best approach for all CPU packages. As mentioned, packing
ensures that least number of power domains are in use and effectively
reduce the active power consumption on paper but there are few
considerations which might conflict with this assumption.

-- Many architectures get best power saving when the entire CPU cluster
or SD is idle. Intel folks already mentioned this and also extended this
concept for attached memory with the CPU domain from self refresh point
of view. This is true for the CPUs who have very little active leakage and
hence "race to idle" would be better so that cluster can hit the deeper
C-state to save more power.

-- Spreading vs Packing actually can be made OPP(CPU operating point)
dependent. Some of the mobile workload and power numbers measured in
the past shown that when CPU operating at lower OPP(considering the
load is less),
packing would be the best option to have higher opportunity for cluster to idle
where as while operating at higher operating point(assuming the higher CPU load
and possibly more threads), a spread with race to idle in mind might
be beneficial.
Of-course this is going to be bit messy since the CPUFreq and scheduler needs
to be linked.

-- May be this is already possible but for architectures like big.LITTLE,
the power consumption and active leakage can be significantly different
across big and little CPU packages.
Meaning the big CPU cluster or SD might be more power efficient
with packing where as Little CPU cluster would be power efficient
with spreading. Hence the possible need of per SD configurability.

Ofcourse all of this can be done step by step starting with most simple
power policy as stated by Peter.

Regards
Santosh











been used whenever possible in sd.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/