Re: [PATCH v5 2/2] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform
From: Shrikanth Hegde
Date: Tue Oct 03 2023 - 08:27:41 EST
On 10/3/23 2:50 PM, Pierre Gondois wrote:
> Hello Shrikanth,
> Some NITs about the commit message:
>
Hi Pierre.
> On 9/29/23 17:52, Shrikanth Hegde wrote:
>> sysctl sched_energy_aware is available for the admin to disable/enable
>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>> schedutil CPUfreq governor, frequency invariant load tracking etc.
>> A platform may boot without EAS capability, but could gain such
>> capability at runtime For example, changing/registering the CPUfreq
>
> Missing dot I think: 'runtime. For example,'
ok.
>
>> governor to schedutil.
>>
>> At present, though platform doesn't support EAS, this sysctl returns 1
>> and it ends up calling build_perf_domains on write to 1 and
>> NOP when writing to 0. That is confusing and un-necessary.
>
This is current problematic behavior that patch 2/2 tries to address.
> I'm not sure I fully understand the sentence:
> - it sounds that the user is writing a value to either 1/0
> (I think the user is writing 1/0 to the sysctl)
Yes, any user with root
privileges can edit this file and perform read and write.
> - aren't the sched domain rebuilt even when writing 0 to the sysctl ?
> I'm not sure I understand to what the NOP is referring to exactly.
>
Complete sched domains aren't built as this case goes to match1 and match2 statements.
> What about:
> Platforms without EAS capability currently advertise this sysctl.
> Its effects (i.e. rebuilding sched-domains) is unnecessary on
> such platforms and its presence can be confusing.
>
look ok. the changelog had described in detail IMHO
>>
>> Desired behavior would be to, have this sysctl to enable/disable the EAS
>
> Unnecessary comma I think
>
>> on supported platform. On Non supported platform write to the sysctl
>
> Non supported -> non-supported
ok for the above two nits.
>
>> would return not supported error and read of the sysctl would return
>> empty. So> sched_energy_aware returns empty - EAS is not possible at
>> this moment
>> This will include EAS capable platforms which have at least one EAS
>> condition false during startup, e.g. using a Performance CPUfreq governor
>
> Just a remark, using the performance governor is not exactly a condition
> disabling EAS, it is more 'not using the schedutil CPUfreq governor'
>
ok.
>> sched_energy_aware returns 0 - EAS is supported but disabled by admin.
>> sched_energy_aware returns 1 - EAS is supported and enabled.
>>
>> User can find out the reason why EAS is not possible by checking
>> info messages. sched_is_eas_possible returns true if the platform
>> can do EAS at this moment.
>>
>> Depends on [PATCH v5 1/2] sched/topology: Remove EM_MAX_COMPLEXITY limit
>> to be applied first.
>
> I think it's implied as the 2 patches are sent together.
>
yes. Did mention it explicitly since b4 mbox can try apply 2/2 first.
had run into similar issues recently.
> Otherwise:
> Tested-by: Pierre Gondois <pierre.gondois@xxxxxxx>
>
>>
Thank you very much for the testing it and providing the tag.
>> Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
>> ---
>> Documentation/admin-guide/sysctl/kernel.rst | 3 +-
>> kernel/sched/topology.c | 112 +++++++++++++-------
>> 2 files changed, 76 insertions(+), 39 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst
>> b/Documentation/admin-guide/sysctl/kernel.rst
>> index cf33de56da27..d89ac2bd8dc4 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -1182,7 +1182,8 @@ automatically on platforms where it can run
>> (that is,
>> platforms with asymmetric CPU topologies and having an Energy
>> Model available). If your platform happens to meet the
>> requirements for EAS but you do not want to use it, change
>> -this value to 0.
>> +this value to 0. On Non-EAS platforms, write operation fails and
>> +read doesn't return anything.
>>
>> task_delayacct
>> ===============
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index e0b9920e7e3e..a654d0186ac0 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -212,6 +212,70 @@ static unsigned int sysctl_sched_energy_aware = 1;
>> static DEFINE_MUTEX(sched_energy_mutex);
>> static bool sched_energy_update;
>>
>> +extern struct cpufreq_governor schedutil_gov;
>> +static bool sched_is_eas_possible(const struct cpumask *cpu_mask)
>> +{
>> + bool any_asym_capacity = false;
>> + struct cpufreq_policy *policy;
>> + struct cpufreq_governor *gov;
>> + int i;
>> +
>> + /* EAS is enabled for asymmetric CPU capacity topologies. */
>> + for_each_cpu(i, cpu_mask) {
>> + if (per_cpu(sd_asym_cpucapacity, i)) {
>> + any_asym_capacity = true;
>> + break;
>> + }
>> + }
>> + if (!any_asym_capacity) {
>> + if (sched_debug()) {
>> + pr_info("rd %*pbl: Checking EAS, CPUs do not have
>> asymmetric capacities\n",
>> + cpumask_pr_args(cpu_mask));
>> + }
>> + return false;
>> + }
>> +
>> + /* EAS definitely does *not* handle SMT */
>> + if (sched_smt_active()) {
>> + if (sched_debug()) {
>> + pr_info("rd %*pbl: Checking EAS, SMT is not supported\n",
>> + cpumask_pr_args(cpu_mask));
>> + }
>> + return false;
>> + }
>> +
>> + if (!arch_scale_freq_invariant()) {
>> + if (sched_debug()) {
>> + pr_info("rd %*pbl: Checking EAS: frequency-invariant load
>> tracking not yet supported",
>> + cpumask_pr_args(cpu_mask));
>> + }
>> + return false;
>> + }
>> +
>> + /* Do not attempt EAS if schedutil is not being used. */
>> + for_each_cpu(i, cpu_mask) {
>> + policy = cpufreq_cpu_get(i);
>> + if (!policy) {
>> + if (sched_debug()) {
>> + pr_info("rd %*pbl: Checking EAS, cpufreq policy not
>> set for CPU: %d",
>> + cpumask_pr_args(cpu_mask), i);
>> + }
>> + return false;
>> + }
>> + gov = policy->governor;
>> + cpufreq_cpu_put(policy);
>> + if (gov != &schedutil_gov) {
>> + if (sched_debug()) {
>> + pr_info("rd %*pbl: Checking EAS, schedutil is
>> mandatory\n",
>> + cpumask_pr_args(cpu_mask));
>> + }
>> + return false;
>> + }
>> + }
>> +
>> + return true;
>> +}
>> +
>> void rebuild_sched_domains_energy(void)
>> {
>> mutex_lock(&sched_energy_mutex);
>> @@ -231,6 +295,15 @@ static int sched_energy_aware_handler(struct
>> ctl_table *table, int write,
>> return -EPERM;
>>
>> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> + if (!sched_is_eas_possible(cpu_active_mask)) {
>> + if (write) {
>> + return -EOPNOTSUPP;
>> + } else {
>> + *lenp = 0;
>> + return 0;
>> + }
>> + }
>> +
>> if (!ret && write) {
>> state = static_branch_unlikely(&sched_energy_present);
>> if (state != sysctl_sched_energy_aware)
>> @@ -351,61 +424,24 @@ static void sched_energy_set(bool has_eas)
>> * 4. schedutil is driving the frequency of all CPUs of the rd;
>> * 5. frequency invariance support is present;
>> */
>> -extern struct cpufreq_governor schedutil_gov;
>> static bool build_perf_domains(const struct cpumask *cpu_map)
>> {
>> int i;
>> struct perf_domain *pd = NULL, *tmp;
>> int cpu = cpumask_first(cpu_map);
>> struct root_domain *rd = cpu_rq(cpu)->rd;
>> - struct cpufreq_policy *policy;
>> - struct cpufreq_governor *gov;
>>
>> if (!sysctl_sched_energy_aware)
>> goto free;
>>
>> - /* EAS is enabled for asymmetric CPU capacity topologies. */
>> - if (!per_cpu(sd_asym_cpucapacity, cpu)) {
>> - if (sched_debug()) {
>> - pr_info("rd %*pbl: CPUs do not have asymmetric
>> capacities\n",
>> - cpumask_pr_args(cpu_map));
>> - }
>> - goto free;
>> - }
>> -
>> - /* EAS definitely does *not* handle SMT */
>> - if (sched_smt_active()) {
>> - pr_warn("rd %*pbl: Disabling EAS, SMT is not supported\n",
>> - cpumask_pr_args(cpu_map));
>> - goto free;
>> - }
>> -
>> - if (!arch_scale_freq_invariant()) {
>> - if (sched_debug()) {
>> - pr_warn("rd %*pbl: Disabling EAS: frequency-invariant
>> load tracking not yet supported",
>> - cpumask_pr_args(cpu_map));
>> - }
>> + if (!sched_is_eas_possible(cpu_map))
>> goto free;
>> - }
>>
>> for_each_cpu(i, cpu_map) {
>> /* Skip already covered CPUs. */
>> if (find_pd(pd, i))
>> continue;
>>
>> - /* Do not attempt EAS if schedutil is not being used. */
>> - policy = cpufreq_cpu_get(i);
>> - if (!policy)
>> - goto free;
>> - gov = policy->governor;
>> - cpufreq_cpu_put(policy);
>> - if (gov != &schedutil_gov) {
>> - if (rd->pd)
>> - pr_warn("rd %*pbl: Disabling EAS, schedutil is
>> mandatory\n",
>> - cpumask_pr_args(cpu_map));
>> - goto free;
>> - }
>> -
>> /* Create the new pd and add it to the local list. */
>> tmp = pd_init(i);
>> if (!tmp)
>> --
>> 2.39.3
>>
will send out v6 with these changes to changelog and Tested-by tag.
will wait for a while to see if there are any concerns or comments.