Re: [RFC 0/3] Experimental patchset for CPPC

From: Ashwin Chaugule
Date: Fri Aug 15 2014 - 09:08:57 EST

Hi Peter,

On 15 August 2014 02:19, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, Aug 14, 2014 at 05:56:10PM -0400, Ashwin Chaugule wrote:
>> On 14 August 2014 16:51, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Thu, Aug 14, 2014 at 03:57:07PM -0400, Ashwin Chaugule wrote:
>> >>
>> >>
>> >> What is CPPC:
>> >> =============
>> >>
>> >> CPPC is the new interface for CPU performance control between the OS and the
>> >> platform defined in ACPI 5.0+. The interface is built on an abstract
>> >> representation of CPU performance rather than raw frequency. Basic operation
>> >> consists of:
>> >
>> > Why do we want this? Typically we've ignored ACPI and gone straight to
>> > MSR access, intel_pstate and intel_idle were created especially to avoid
>> > ACPI, so why return to it.
>> >
>> > Also, the whole interface sounds like trainwreck (one would not expect
>> > anything else from ACPI).
>> >
>> > So _why_?
>> The overall idea is that tying the notion of CPU performance to CPU
>> frequency is no longer true these days.[1]. So, using some direction
>> from an OS , the platforms want to be able to decide how to adjust CPU
>> performance by using knowledge that may be very platform specific.
>> e.g. through the use of performance counters, thermal budgets and
>> other system specific constraints. So, CPPC describes a way for the OS
>> to request performance within certain bounds and then letting the
>> platform optimize it within those constraints. Expressing CPU
>> performance in an abstract way, should also help keep things uniform
>> across various architecture implementations.
>> [1]-
>> [2] -
> Yeah, I'm so not clicking in that; if you want to make an argument make
> it here.
> In any case; that's all nice and shiny that the 'hardware' works like
> that. But have these people considered how we're supposed to use it?
> How should we know what to do with a new task? Do we stack it on a busy
> CPU, do we wake an idle cpu and how are we going to tell which is the
> 'best' option.
> How are we going to do DVFS like accounting if we don't know wtf the
> hardware can or will do.
> And how can you design these interfaces and hardware without at least
> partially knowing the answer to these questions.

Although, the CPPC descriptor table and the spec dont describe the
algorithm, it still gives
a good enough idea of how the platform would react. I'll try to
summarize it briefly. I have a few more register specific details in
the cover letter if needed.


(1) The OS can read from the platform what each CPU is capable of at
the moment. Highest, Lowest CPU performance bounds which are
essentially the thresholds at which this CPU can deliver. The platform
can even tell us a "guaranteed performance value" at that moment. This
is the level the CPU is expected to deliver taking into account all
the possible constraints. (e.g. thermal, power budgets etc.). If the
"guaranteed" value changes due to some reason, the platform raises a
notification, so the OS can reevaluate.

(2) When an OS requests a specific performance value, it supplies a
Max, Min and Desired value. The platform is expected to deliver CPU
performance within this range. The Delivered performance register
should reflect what the platform decided.

(3) If the OS knows that it needs to step up or lower the CPU
performance value for a specific period of time, then it sets the Time
Window and Performance Reduction Tolerance register in addition to
Max, Min, and Desired. This will force the platform to deliver CPU
performance which on average over the Time Window equals the value in
Performance Reduction.

So, its not as though the OS is left completely blind. The platform
maintains updated information about CPUs performance capabilities and
relies on hints from the OS to make decisions and it also feeds back
what it decides.

If the OS only looks at Highest, Lowest, Delivered registers and only
writes to Desired, then we're not really any different than how we do
things today in the CPUFreq layer. Or even in the case of
intel_pstate, if you map Desired to PERF_CTL and get value of
Delivered by using aperf/mperf ratios (as my experimental driver
does), then we can still maintain the existing system performance. It
seems like if an OS can make use of the additional information then it
should be net win for overall power savings and performance
enhancement. Also, using the CPPC descriptors, we should be able to
have one driver across X86 and ARM64. (possibly others too.)

So I'm still learning about the scheduler and dont have enough
knowledge yet. Hence this discussion with you guys. Hopefully with the
above flow, you can see that:

(a) we can plug the cppc driver to the existing infrastructure and not
change anything really. (except the freq domain awareness issues I
mentioned earlier) (short term)

(b) we come up with ways to provide the bounds around a Desired value
using the information from the platform. (long term)

I briefly looked at the x86 HWP (Hardware Performance States) in the
s/w manual again. Its essentially an implementation of CPPC. It seems
like X86 has implemented most if not all these registers as MSRs. I'm
really interested in knowing if anyone there is/has been working on
using them and what they found.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at