Re: [RFC/RFT][PATCH 2/2] cpufreq: schedutil: Utilization aggregation

From: Rafael J. Wysocki
Date: Mon Apr 10 2017 - 17:13:20 EST


On Mon, Apr 10, 2017 at 1:26 PM, Juri Lelli <juri.lelli@xxxxxxx> wrote:
> Hi Rafael,

Hi,

> thanks for this set. I'll give it a try (together with your previous
> patch) in the next few days.
>
> A question below.
>
> On 10/04/17 02:11, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>
>> Due to the limitation of the rate of frequency changes the schedutil
>> governor only estimates the CPU utilization entirely when it is about
>> to update the frequency for the corresponding cpufreq policy. As a
>> result, the intermediate utilization values are discarded by it,
>> but that is not appropriate in general (like, for example, when
>> tasks migrate from one CPU to another or exit, in which cases the
>> utilization measured by PELT may change abruptly between frequency
>> updates).
>>
>> For this reason, modify schedutil to estimate CPU utilization
>> completely whenever it is invoked for the given CPU and store the
>> maximum encountered value of it as input for subsequent new frequency
>> computations. This way the new frequency is always based on the
>> maximum utilization value seen by the governor after the previous
>> frequency update which effectively prevents intermittent utilization
>> variations from causing it to be reduced unnecessarily.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>> ---
>
> [...]
>
>> -static void sugov_get_util(unsigned long *util, unsigned long *max)
>> +static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned int flags)
>> {
>> + unsigned long cfs_util, cfs_max;
>> struct rq *rq = this_rq();
>> - unsigned long cfs_max;
>>
>> - cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
>> + sg_cpu->flags |= flags & SCHED_CPUFREQ_RT_DL;
>> + if (sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
>> + return;
>>
>
> IIUC, with this you also keep track of any RT/DL tasks that woke up
> during the last throttling period, and react accordingly as soon a
> triggering event happens after the throttling period elapses.

Right (that's the idea at least).

> Given that for RT (and still for DL as well) the next event is a
> periodic tick, couldn't happen that the required frequency transition
> for an RT task, that unfortunately woke up before the end of a throttling
> period, gets delayed of a tick interval (at least 4ms on ARM)?

No, that won't be an entire tick unless it wakes up exactly at the
update time AFAICS.

> Don't we need to treat such wake up events (RT/DL) in a special way and
> maybe set a timer to fire and process them as soon as the current
> throttling period elapses? Might be a patch on top of this I guess.

Setting a timer won't be a good idea at all, as it would need to be a
deferrable one and Thomas would not like that (I'm sure).

We could in principle add some special casing around that, like for
example pass flags to sugov_should_update_freq() and opportunistically
ignore freq_update_delay_ns if SCHED_CPUFREQ_RT_DL is set in there,
but that would lead to extra overhead on systems where frequency
updates happen in-context.

Also the case looks somewhat corner to me to be honest.

Thanks,
Rafael