Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

From: Pierre Gondois
Date: Fri Sep 15 2023 - 09:36:09 EST


Hello Valentin,

On 9/15/23 14:00, Valentin Schneider wrote:
On 14/09/23 23:26, Shrikanth Hegde wrote:
On 9/14/23 9:51 PM, Valentin Schneider wrote:
On 13/09/23 17:18, Shrikanth Hegde wrote:
sysctl_sched_energy_aware is available for the admin to disable/enable
energy aware scheduling(EAS). EAS is enabled only if few conditions are
met by the platform. They are, asymmetric CPU capacity, no SMT,
valid cpufreq policy, frequency invariant load tracking. It is possible
platform when booting may not have EAS capability, but can do that after.
For example, changing/registering the cpufreq policy.

At present, though platform doesn't support EAS, this sysctl is still
present and it ends up calling rebuild of sched domain on write to 1 and
NOP when writing to 0. That is confusing and un-necessary.



Hi Valentin, Thanks for taking a look at this patch.

But why would you write to it in the first place? Or do you mean to use
this as an indicator for userspace that EAS is supported?


Since this sysctl is present and its value being 1, it gives the
impression to the user that EAS is supported when it is not.
So its an attempt to correct that part.


Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
supported? And on top of it, prevent all writes when EAS isn't supported
(perf domains cannot be built, so there would be no point in forcing a
rebuild that will do nothing).

I think the issue comes from the fact there is no variable representing
whether EAS is supported or not. sched_energy_enabled()/sched_energy_present
tells whether EAS is actively running on the system instead.

So on a system with EAS running, I think what would happen is:
# Disable EAS and set sched_energy_present=0
echo 0 > /proc/sys/kernel/sched_energy_aware

# sched_energy_present==0, so we get -EOPNOTSUPP
echo 1 > /proc/sys/kernel/sched_energy_aware



I can never remember how to properly use the sysctl API, so that's a very
crude implementation, but something like so?

---

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a5bc678c089..dadfc5afc4121 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
if (write && !capable(CAP_SYS_ADMIN))
return -EPERM;
+ if (!sched_energy_enabled()) {
+ if (write)
+ return -EOPNOTSUPP;
+ else {
+ size_t len;
+
+ if (*ppos) {
+ *lenp = 0;
+ return 0;
+ }
+
+ len = snprintf((char *)buffer, 3, "0\n");
+
+ *lenp = len;
+ *ppos += len;
+ return 0;
+ }
+ }
+
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) {
- state = static_branch_unlikely(&sched_energy_present);
+ state = sched_energy_enabled();
if (state != sysctl_sched_energy_aware)
rebuild_sched_domains_energy();
}


Regards,
Pierre