Re: [RFC/RFT][PATCH v4 1/2] cpufreq: New governor using utilization data from the scheduler

From: Rafael J. Wysocki
Date: Thu Mar 03 2016 - 14:20:15 EST


On Thu, Mar 3, 2016 at 4:20 AM, Steve Muckle <steve.muckle@xxxxxxxxxx> wrote:
> On 03/01/2016 12:20 PM, Rafael J. Wysocki wrote:
>>> I'm specifically worried about the check below where we omit a CPU's
>>> capacity request if its last update came before the last sample time.
>>>
>>> Say there are 2 CPUs in a frequency domain, HZ is 100 and the sample
>>> delay here is 4ms.
>>
>> Yes, that's the case I clearly didn't take into consideration. :-)
>>
>> My assumption was that the sample delay would always be greater than
>> the typical update rate which of course need not be the case.
>>
>> The reason I added the check at all was that the numbers from the
>> other CPUs may become stale if those CPUs are idle for too long, so at
>> one point the contributions from them need to be discarded. Question
>> is when that point is and since sample delay may be arbitrary, that
>> mechanism has to be more complex.
>
> Yeah this has been an open issue on our end as well. Sampling-based
> governors of course solved this primarily via their fundamental nature
> and sampling rate. The interactive governor also has a separate tunable
> IIRC which specified how long a CPU may have its sampling timer deferred
> due to idle when running @ > fmin (the "slack timer").
>
> Decoupling the CPU update staleness limit from the freq change rate
> limit via a separate tunable would be valuable IMO. Would you be
> amenable to a patch that did that?

Yes, I would.

It still would be better, though, if that didn't have to be a tunable.

What do you think about my idea to use NSEC_PER_SEC / HZ as the
staleness limit (like in https://patchwork.kernel.org/patch/8477261/)?

[cut]

>> Moreover, since 0 utilization gets you to run in f_min no matter what,
>> if you treat f_max as an absolute, you're going to underutilize the
>> P-states in the upper half of the available range.
>
> Sorry I didn't follow. What do you mean by underutilize the upper half
> of the range? I don't see how using RELATION_L with (util/max) * fmax *
> (headroom) wouldn't be correct in that regard.

Suppose all of the util values from 0 to max are equally probable (or
equally frequent) and the available frequencies are close enough to
each other that it doesn't really matter whether _C or _L is used.

Say f_min is 400 and f_max is 1000.

Then, if you take next_freq = f_max * util / max, 50% of requests will
fall into the 400-500 section of the available frequency range. Of
course, 40% of them will fall to f_min, but that means that the other
available states will be used less frequently, on the average.

I would prefer that to be more balanced.

Thanks,
Rafael