Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs

From: Joel Fernandes
Date: Wed Mar 22 2017 - 19:56:59 EST


On Mon, Mar 20, 2017 at 5:34 AM, Patrick Bellasi
<patrick.bellasi@xxxxxxx> wrote:
> On 20-Mar 09:26, Vincent Guittot wrote:
>> On 20 March 2017 at 04:57, Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>> > On 19-03-17, 14:34, Rafael J. Wysocki wrote:
>> >> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>> >>
>> >> The PELT metric used by the schedutil governor underestimates the
>> >> CPU utilization in some cases. The reason for that may be time spent
>> >> in interrupt handlers and similar which is not accounted for by PELT.
>>
>> Are you sure of the root cause described above (time stolen by irq
>> handler) or is it just a hypotheses ? That would be good to be sure of
>> the root cause
>> Furthermore, IIRC the time spent in irq context is also accounted as
>> run time for the running cfs task but not RT and deadline task running
>> time
>
> As long as the IRQ processing does not generate a context switch,
> which is happening (eventually) if the top half schedule some deferred
> work to be executed by a bottom half.
>
> Thus, me too I would say that all the top half time is accounted in
> PELT, since the current task is still RUNNABLE/RUNNING.

Sorry if I'm missing something but doesn't this depend on whether you
have CONFIG_IRQ_TIME_ACCOUNTING enabled?

__update_load_avg uses rq->clock_task for deltas which I think
shouldn't account IRQ time with that config option. So it should be
quite possible for IRQ time spent to reduce the PELT signal right?

>
>> So I'm not really aligned with the description of your problem: PELT
>> metric underestimates the load of the CPU. The PELT is just about
>> tracking CFS task utilization but not whole CPU utilization and
>> according to your description of the problem (time stolen by irq),
>> your problem doesn't come from an underestimation of CFS task but from
>> time spent in something else but not accounted in the value used by
>> schedutil
>
> Quite likely. Indeed, it can really be that the CFS task is preempted
> because of some RT activity generated by the IRQ handler.
>
> More in general, I've also noticed many suboptimal freq switches when
> RT tasks interleave with CFS ones, because of:
> - relatively long down _and up_ throttling times
> - the way schedutil's flags are tracked and updated
> - the callsites from where we call schedutil updates
>
> For example it can really happen that we are running at the highest
> OPP because of some RT activity. Then we switch back to a relatively
> low utilization CFS workload and then:
> 1. a tick happens which produces a frequency drop

Any idea why this frequency drop would happen? Say a running CFS task
gets preempted by RT task, the PELT signal shouldn't drop for the
duration the CFS task is preempted because the task is runnable, so
once the CFS task gets CPU back, schedutil should still maintain the
capacity right?

Regards,
Joel