Re: [RFC PATCH 3/4] cpufreq: Add Active Stats calls tracking frequency changes

From: Rafael J. Wysocki
Date: Tue Jun 22 2021 - 09:52:14 EST


On Tue, Jun 22, 2021 at 3:42 PM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 6/22/21 1:28 PM, Rafael J. Wysocki wrote:
> > On Tue, Jun 22, 2021 at 9:59 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >>
> >> The Active Stats framework tracks and accounts the activity of the CPU
> >> for each performance level. It accounts the real residency, when the CPU
> >> was not idle, at a given performance level. This patch adds needed calls
> >> which provide the CPU frequency transition events to the Active Stats
> >> framework.
> >>
> >> Signed-off-by: Lukasz Luba <lukasz.luba@xxxxxxx>
> >> ---
> >> drivers/cpufreq/cpufreq.c | 5 +++++
> >> 1 file changed, 5 insertions(+)
> >>
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >> index 802abc925b2a..d79cb9310572 100644
> >> --- a/drivers/cpufreq/cpufreq.c
> >> +++ b/drivers/cpufreq/cpufreq.c
> >> @@ -14,6 +14,7 @@
> >>
> >> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >>
> >> +#include <linux/active_stats.h>
> >> #include <linux/cpu.h>
> >> #include <linux/cpufreq.h>
> >> #include <linux/cpu_cooling.h>
> >> @@ -387,6 +388,8 @@ static void cpufreq_notify_transition(struct cpufreq_policy *policy,
> >>
> >> cpufreq_stats_record_transition(policy, freqs->new);
> >> policy->cur = freqs->new;
> >> +
> >> + active_stats_cpu_freq_change(policy->cpu, freqs->new);
> >> }
> >> }
> >>
> >> @@ -2085,6 +2088,8 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> >> policy->cpuinfo.max_freq);
> >> cpufreq_stats_record_transition(policy, freq);
> >>
> >> + active_stats_cpu_freq_fast_change(policy->cpu, freq);
> >> +
> >
> > This is quite a bit of overhead and so why is it needed in addition to
> > the code below?
>
> The code below is tracing, which is good for post-processing. We use in
> our tool LISA, when we analyze the EAS decision, based on captured
> trace data.
>
> This new code is present at run time, so subsystems like our thermal
> governor IPA can use it and get better estimation about CPU used power
> for any arbitrary period, e.g. 50ms, 100ms, 300ms, ...

So can it be made not run when the IPA is not using it?

> >
> > And pretty much the same goes for the idle loop change. There is
> > quite a bit of instrumentation in that code already and it avoids
> > adding new locking for a reason. Why is it a good idea to add more
> > locking to that code?
>
> This active_stats_cpu_freq_fast_change() doesn't use the locking, it
> relies on schedutil lock in [1].

Ah, OK.

But it still adds overhead AFAICS.

> >
> >> if (trace_cpu_frequency_enabled()) {
> >> for_each_cpu(cpu, policy->cpus)
> >> trace_cpu_frequency(freq, cpu);
> >> --
>
>
> [1]
> https://elixir.bootlin.com/linux/latest/source/kernel/sched/cpufreq_schedutil.c#L447