Re: [PATCH] PM / EM: Inefficient OPPs detection

From: Lukasz Luba
Date: Thu Apr 15 2021 - 11:32:39 EST


Hi Quentin,

On 4/15/21 4:20 PM, Quentin Perret wrote:
On Thursday 15 Apr 2021 at 16:14:46 (+0100), Vincent Donnefort wrote:
On Thu, Apr 15, 2021 at 02:59:54PM +0000, Quentin Perret wrote:
On Thursday 15 Apr 2021 at 15:34:53 (+0100), Vincent Donnefort wrote:
On Thu, Apr 15, 2021 at 01:16:35PM +0000, Quentin Perret wrote:
On Thursday 08 Apr 2021 at 18:10:29 (+0100), Vincent Donnefort wrote:
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -10,6 +10,7 @@
#include "sched.h"
+#include <linux/energy_model.h>
#include <linux/sched/cpufreq.h>
#include <trace/events/power.h>
@@ -164,6 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
freq = map_util_freq(util, freq, max);
+ /* Avoid inefficient performance states */
+ freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq);

I remember this was discussed when Douglas sent his patches some time
ago, but I still find it sad we index the EM table here but still
re-index the cpufreq frequency table later :/

Yes in your case this lookup is very inexpensive, but still. EAS relies
on the EM's table matching cpufreq's accurately, so this second lookup
still feels rather unnecessary ...

To get only a single lookup, we would need to bring the inefficiency knowledge
directly to the cpufreq framework. But it has its own limitations:

The cpufreq driver can have its own resolve_freq() callback, which means that
not all the drivers would benefit from that feature.

The cpufreq_table can be ordered and accessed in several ways which brings
many combinations that would need to be supported, ending-up with something
much more intrusive. (We can though decide to limit the feature to the low to
high access that schedutil needs).

As the EM needs schedutil to exist anyway, it seemed to be the right place for
this code. It allows any cpufreq driver to benefit from the feature, simplify a
potential extension for a usage by devfreq devices and as a bonus it speeds-up
energy computing, allowing a more complex Energy Model.

I was thinking of something a bit simpler. cpufreq_driver_resolve_freq
appears to be used only from schedutil (why is it even then?), so we
could just pull it into cpufreq_schedutil.c and just plain skip the call
to cpufreq_frequency_table_target if the target freq has been indexed in
the EM table -- it should already be matching a real OPP.

Thoughts?
Quentin

Can try that for a V2. That means em_pd_get_efficient_freq() would have to
know about policy clamping (but I don't think that's an issue)

Indeed, and I think we can even see this as an improvement as EAS will
now see policy clamps as well in compute_energy().

Are you sure that the 'policy' can be accessed from compute_energy()?
It can be from schedutil freq switch path, but I'm not use about our
feec()..

For me this cpufreq_driver_resolve_freq sounds a bit out of this patch
subject.

Regards,
Lukasz