Re: [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
From: Shrikanth Hegde
Date: Wed Sep 06 2023 - 07:36:45 EST
On 9/5/23 7:33 PM, Pierre Gondois wrote:
> Hello Shrikanth,
> I tried the patch (on a platform using the cppc_cpufreq driver). The
> platform
> normally has EAS enabled, but the patch removed the sched_energy_aware
> sysctl.
> It seemed the following happened (in the below order):
>
> 1. sched_energy_aware_sysctl_init()
> Doesn't set sysctl_sched_energy_aware as cpufreq_freq_invariance isn't set
> and arch_scale_freq_invariant() returns false
>
> 2. cpufreq_register_driver()
> Sets cpufreq_freq_invariance during cpufreq initialization
> sched_energy_set()
>
> 3. sched_energy_set()
> Is called with has_eas=0 since build_perf_domains() doesn't see the
> platform
> as EAS compatible. Indeed sysctl_sched_energy_aware=0.
> So with sysctl_sched_energy_aware=0 and has_eas=0, sched_energy_aware
> sysctl
> is not enabled even though EAS should be possible.
>
>
> On 9/1/23 08:52, Shrikanth Hegde wrote:
>> Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
>> some of the architectures. IIUC its meant to either force rebuild the
>> perf domains or cleanup the perf domains by echoing 1 or 0 respectively.
>
> There is a definition of the sysctl at:
> Documentation/admin-guide/sysctl/kernel.rst::sched_energy_aware
[...]
>>
>>
>> +static unsigned int sysctl_sched_energy_aware;
>> +static struct ctl_table_header *sysctl_eas_header;
>
> The variables around the presence/absence of EAS are:
> - sched_energy_present:
> EAS is up and running
>
> - sysctl_sched_energy_aware:
> The user wants to use EAS (or not). Doesn't mean EAS can run on the
> platform.
>
> - sched_energy_set/partition_sched_domains_locked's "has_eas":
> Local variable. Represent whether EAS can run on the platform.
>
> IMO it would be simpler to (un)register sched_energy_aware sysctl
> in partition_sched_domains_locked(), based on the value of "has_eas".
> This would allow to let all the logic as it is right now, inside
> build_perf_domains(), and then advertise sched_energy_aware sysctl
> if EAS can run on the platform.
> sched_energy_aware_sysctl_init() would be deleted then.
>
>
yes. that is true. and there is no variable which holds the info if the system
is capable of EAS.
Retrospecting, the reason for starting this patch series was this,
sysctl_sched_energy_aware didnt make sense on power10 platform since it
has SMT and symmetric CPU capacities. with current code writing 1 to
it cause rebuild of sched domains but EAS wouldn't be possible.
Possible Approaches:
1.
Make this sysctl write as NOP if the platform doesn't has EAS capabilities at
the moment. Do those checks in sched_energy_aware_handler before handling the
change in value. Return EINVAL. And Update sysctl description that on such
platforms value change is NOP. Relatively simpler change.
2.
Current patch approach, remove the sysctl completely on non supported
architectures and re-enable it if the system becomes capable of doing EAS.
With the current patch, instead of using sched_energy_update, use another
variable called sched_energy_change_in_sysctl(maybe different name). I think
that would handle all the cases. Another variable can be avoided by encoding
the info in sysctl_sched_energy_aware itself in the handler call, since it
takes only 1 or 0 as the value. upper bits are free to use. update the sysctl
as well with this behavior. plus minor cleanup to remove the init of sysctl.
Suggestions?