Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
From: Shrikanth Hegde
Date: Fri Sep 22 2023 - 17:05:21 EST
On 9/22/23 8:14 PM, Pierre Gondois wrote:
>
>
> On 9/18/23 14:22, Valentin Schneider wrote:
>> On 15/09/23 23:40, Shrikanth Hegde wrote:
>>> On 9/15/23 5:30 PM, Valentin Schneider wrote:
>>>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>>>> sysctl_sched_energy_aware is available for the admin to
>>>>>>> disable/enable
>>>>>>> energy aware scheduling(EAS). EAS is enabled only if few
>>>>>>> conditions are
>>>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>>>> valid cpufreq policy, frequency invariant load tracking. It is
>>>>>>> possible
>>>>>>> platform when booting may not have EAS capability, but can do
>>>>>>> that after.
>>>>>>> For example, changing/registering the cpufreq policy.
>>>>>>>
>>>>>>> At present, though platform doesn't support EAS, this sysctl is
>>>>>>> still
>>>>>>> present and it ends up calling rebuild of sched domain on write
>>>>>>> to 1 and
>>>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>>>
>>>>>>
>>>>>
>>>>> Hi Valentin, Thanks for taking a look at this patch.
>>>>>
>>>>>> But why would you write to it in the first place? Or do you mean
>>>>>> to use
>>>>>> this as an indicator for userspace that EAS is supported?
>>>>>>
>>>>>
>>>>> Since this sysctl is present and its value being 1, it gives the
>>>>> impression to the user that EAS is supported when it is not.
>>>>> So its an attempt to correct that part.
>>>>>
>>>>
>>>> Ah, I see. Then how about just making the sysctl return 0 when EAS
>>>> isn't
>>>> supported? And on top of it, prevent all writes when EAS isn't
>>>> supported
>>>> (perf domains cannot be built, so there would be no point in forcing a
>>>> rebuild that will do nothing).
>>>
>>> Yes. That's another way. Thats what I had as possible approach in
>>> https://lore.kernel.org/lkml/d2c945d6-c4f0-a096-0623-731b11484f51@xxxxxxxxxxxxxxxxxx/
>>>
>>
>> Thanks for the link; and apologies for bringing up topics that have been
>> discussed already.
>>
>>>
>>>
>>>>
>>>> I can never remember how to properly use the sysctl API, so that's a
>>>> very
>>>> crude implementation, but something like so?
>>>>
>>>> ---
>>>>
>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>> index 05a5bc678c089..dadfc5afc4121 100644
>>>> --- a/kernel/sched/topology.c
>>>> +++ b/kernel/sched/topology.c
>>>> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct
>>>> ctl_table *table, int write,
>>>> if (write && !capable(CAP_SYS_ADMIN))
>>>> return -EPERM;
>>>>
>>>> + if (!sched_energy_enabled()) {
>>>
>>> Use of sched_energy_enabled won't work as Pierre has indicated.
>>>
>>> Instead this can be done by adding those checks in a helper function to
>>> do similar checks as done build_perf_domains.
>>>
>>> I can send v4 with this approach if it makes more sense. Please let
>>> me know.
>>>
>>
>> So what I'm thinking is the standard approach seems to be to keep the
>> knobs
>> visible, but change how reads/writes to them are handled.
>>
>> For instance, SMT support has
>>
>> /sys/devices/system/cpu/smt
>> /control
>> /active
>>
>> And a system with CONFIG_HOTPLUG_SMT=y but no actual hardware SMT will
>> have:
>>
>> /control = notsupported
>> /active = 0
>
>
> Having such interface for EAS would be ideal no > /active:
> would be the equivalent of the current sysctl_sched_energy_aware
>
> /control:
> would show whether CONFIG_SCHED_DEBUG was set and all the conditions
> to have EAS enabled are satisfied.
>
> Possible states for SMT:
> ---
> static const char *smt_states[] = {
> [CPU_SMT_ENABLED] = "on", // EAS possible and
> running
> [CPU_SMT_DISABLED] = "off", // EAS possible and
> not running
> [CPU_SMT_FORCE_DISABLED] = "forceoff", // not applicable
> for EAS
> [CPU_SMT_NOT_SUPPORTED] = "notsupported", // system with
> smt or not asymmetric or no freq invariance
> [CPU_SMT_NOT_IMPLEMENTED] = "notimplemented", //
> CONFIG_SCHED_DEBUG=n
> };
> ---
>
Likely the current simpler approach more or less achieves the same i think.
With V4 (not yet sent). Didnt send it out yet, as i am not
sure of proc handler's internal way of handling buffers and corner cases. I am
thinking to make *lenp=0 unconditionally and return if EAS is not possible. With
that I have these possibilities.
If sysctl is not there at all, that mean CONFIG_ENERGY_MODEL was not selected
If sysctl return empty string, that mean EAS is not possible at the moment.
One can figure out whats the reason from dmesg.
If sysctl return 1 or 0 - EAS is possible and its either enabled or disabled.
>
>>
>> So IMO it would make sense to keep sched_energy_aware around, but make it
>> read 0 and prevent writes for systems that have the software support
>> compiled but don't have the actual hardware support.
>>
>> In a pinch it also helps to know if CONFIG_ENERGY_MODEL was selected,
>> though that's obvious enough with CONFIG_SCHED_DEBUG=y.
>>