# Re: [PATCH 6/6] cpufreq: schedutil: New governor based on scheduler utilization data

**From: **Michael Turquette

**Date: ** Thu Mar 10 2016 - 18:21:08 EST

Quoting Rafael J. Wysocki (2016-03-09 15:41:34)

On Wed, Mar 9, 2016 at 11:15 AM, Juri Lelli <juri.lelli@xxxxxxx> wrote:

sorry if I didn't reply yet. Trying to cope with jetlag and

talks/meetings these days :-). Let me see if I'm getting what you are

discussing, though.

On 08/03/16 21:05, Rafael J. Wysocki wrote:

On Tue, Mar 8, 2016 at 8:26 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

On Tue, Mar 08, 2016 at 07:00:57PM +0100, Rafael J. Wysocki wrote:

On Tue, Mar 8, 2016 at 12:27 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

a = max_freq gives next_freq = max_freq for x = 1, but with that

choice of a you may never get to x = 1 with frequency invariant

because of the feedback effect mentioned above, so the 1/n produces

the extra boost needed for that (n is a positive integer).

Quite frankly, to me it looks like linear really is a better

approximation for "raw" utilization. That is, for frequency invariant

x we should take:

>* >>*

next_freq = a * x * max_freq / current_freq

>* >>*

(and if x is not frequency invariant, the right-hand side becomes a *

x). Then, the extra boost needed to get to x = 1 for frequency

invariant is produced by the (max_freq / current_freq) factor that is

greater than 1 as long as we are not running at max_freq and a can be

chosen as max_freq.

>* >>*

Expanding terms again, your original formula (without the 1.1 factor of

the last version) was:

>* >*

next_freq = util / max_cap * max_freq

>* >*

and this doesn't work when we have freq invariance since util won't go

over curr_cap.

Can you please remind me what curr_cap is?

What you propose above is to add another factor, so that we have:

>* >*

next_freq = util / max_cap * max_freq / curr_freq * max_freq

>* >*

which should give us the opportunity to reach max_freq also with freq

invariance.

>* >*

This should actually be the same of doing:

>* >*

next_freq = util / max_cap * max_cap / curr_cap * max_freq

>* >*

We are basically scaling how much the cpu is busy at curr_cap back to

the 0..1024 scale. And we use this to select next_freq. Also, we can

simplify this to:

>* >*

next_freq = util / curr_cap * max_freq

>* >*

and we save some ops.

>* >*

However, if that is correct, I think we might have a problem, as we are

skewing OPP selection towards higher frequencies. Let's suppose we have

a platform with 3 OPPs:

>* >*

freq cap

1200 1024

900 768

600 512

>* >*

As soon a task reaches an utilization of 257 we will be selecting the

second OPP as

>* >*

next_freq = 257 / 512 * 1200 ~ 602

>* >*

While the cpu is only 50% busy in this case. And we will go at max OPP

when reaching ~492 (~64% of 768).

>* >*

That said, I guess this might work as a first solution, but we will

probably need something better in the future. I understand Rafael's

concerns regardin margins, but it seems to me that some kind of

additional parameter will be probably needed anyway to fix this.

Just to say again how we handle this in schedfreq, with a -20% margin

applied to the lowest OPP we will get to the next one when utilization

reaches ~410 (80% busy at curr OPP), and so on for the subsequent ones,

which is less aggressive and might be better IMHO.

Well, Peter says that my idea is incorrect, so I'll go for

>* *

next_freq = C * current_freq * util_raw / max

>* *

where C > 1 (and likely C < 1.5) instead.

>* *

That means C has to be determined somehow or guessed. The 80% tipping

point condition seems reasonable to me, though, which leads to C =

1.25.

Right, that is the same value used in the schedfreq series:

+/*

+ * Capacity margin added to CFS and RT capacity requests to provide

+ * some head room if task utilization further increases.

+ */

+unsigned int capacity_margin = 1280;

Regards,

Mike