Re: [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

From: Pierre Gondois
Date: Thu Sep 07 2023 - 12:13:28 EST


Hello Chen,

On 9/1/23 11:49, Chen Yu wrote:
Hi Shrikanth,

On 2023-09-01 at 12:22:49 +0530, Shrikanth Hegde wrote:
Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
some of the architectures. IIUC its meant to either force rebuild the
perf domains or cleanup the perf domains by echoing 1 or 0 respectively.

perf domains are not built when there is SMT, or when there is no
Asymmetric CPU topologies or when there is no frequency invariance.
Since such cases EAS is not set and perf domains are not built. By
changing the values of sysctl_sched_energy_aware, its not possible to
force build the perf domains. Hence remove this sysctl on such platforms
that dont support it. Some of the settings can be changed later
such as smt_active by offlining the CPU's, In those cases if
build_perf_domains returns true, re-enable the sysctl.

Anytime, when sysctl_sched_energy_aware is changed sched_energy_update
is set when building the perf domains. Making use of that to find out if
the change is happening by sysctl or dynamic system change.

Taking different cases:
Case1. system while booting has EAS capability, sysctl will be set 1. Hence
perf domains will be built if needed. On changing the sysctl to 0, since
sched_energy_update is true, perf domains would be freed and sysctl will
not be removed. later sysctl is changed to 1, enabling the perf domains
rebuild again. Since sysctl is already there, it will skip register.

Case2. System while booting doesn't have EAS Capability. Later after system
change it becomes capable of EAS. sched_energy_update is false. Though
sysctl is 0, will go ahead and try to enable eas. This is the current
behaviour. if has_eas is true, then sysctl will be registered. After
that any sysctl change is same as Case1.


I think this change makes sense. Just one question for case 2,
sched_energy_update is not strictly tied with sysctl change, right?
sched_energy_update is true in rebuild_sched_domains_energy().
rebuild_sched_domains_energy() will not only be invoked by sysctl
path via sched_energy_aware_handler(), but also by other path, such
as update_scale_freq_invariant(). If the system boots with EAS capability
disabled, then it becomes EAS capable due to the frequency invariant
readiness(cpufreq policy change?), then
cpufreq_notifier(CPUFREQ_CREATE_POLICY) -> init_amu_fie_callback() ->
amu_fie_setup() -> opology_set_scale_freq_source() ->
update_scale_freq_invariant(true) -> rebuild_sched_domains_energy()
Since sched_energy_update is true, the rebuild of perf domain will be skipped(but
actually we want to create it) Please correct me if I miss something.


I thought 'sched_energy_update' was here to force rebuilding the
perf_domains instead. If sched_energy_update=1, then it prevents from finding
a pre-existing perf_domain and skipping the perf_domain rebuild, unless I also
missed something:


#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
/* Build perf. domains: */
for (i = 0; i < ndoms_new; i++) {
for (j = 0; j < n && !sched_energy_update; j++) {
if (cpumask_equal(doms_new[i], doms_cur[j]) &&
cpu_rq(cpumask_first(doms_cur[j]))->rd->pd) {
has_eas = true;
goto match3;
}
}
/* No match - add perf. domains for a new rd */
has_eas |= build_perf_domains(doms_new[i]);
match3:
;
}
sched_energy_set(has_eas);
#endif