Re: [RFC PATCH 3/3] sched: introduce synchronized idle injection

From: Jacob Pan
Date: Thu Nov 05 2015 - 10:34:03 EST

On Thu, 5 Nov 2015 15:33:32 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Thu, Nov 05, 2015 at 06:22:58AM -0800, Arjan van de Ven wrote:
> > On 11/5/2015 2:09 AM, Peter Zijlstra wrote:
> >
> > >I can see such a scheme having a fairly big impact on latency,
> > >esp. with forced idleness such as this. That's not going to be
> > >popular for many workloads.
> >
> > idle injection is a last ditch effort in thermal management, before
> > this gets used the hardware already has clamped you to a low
> > frequency, reduced memory speeds, probably dimmed your screen etc
> > etc.
> >
Just to clarify, the low frequency here is not necessarily the minimum
frequency. It is usually the Pe (max efficiency).
> > at this point there are 3 choices
> > 1) Shut off the device
> > 2) do uncoordinated idle injection for 40% of the time
> > 3) do coordinated idle injection for 5% of the time
> >
> > as much as force injecting idle in a synchronized way sucks, the
> > alternatives are worse.
> OK, it wasn't put that way. I figured it was a way to use less power
> on any workload with idle time on.
> That said; what kind of devices are we talking about here; mobile with
> pittyful heat dissipation? Surely a well designed server or desktop
> class system should never get into this situation in the first place.
Yes, Mobile devices, especially fan-less, are the targets. On one side
we all desire high performance, but it does not come free. The
performance tax might limit the ability to scale at the low end.
e.g. on skylake-y P1 is 1.5GHz, Pe(max efficiency, dynamic) is ~900MHz,
Pmin is 400Mhz.
When thermal limit occurs, there are two options
1. limit freq to Pmin 400Mhz and run 100%
2. let CPU run at ~800Mhz but inject idle at 50%

#2 option provides better performance per watt since it can scale
nearly linearly, i.e. 50% performance at 50% power. For my own limited
testing and this can vary greatly by parts, running at Pmin vs Pe can
lose 30% perf per watt.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at