Re: [RFC PATCH] sched/fair: use max_spare_cap_cpu if it is more energy efficient
From: brookxu
Date: Wed Oct 27 2021 - 22:09:24 EST
Dietmar Eggemann wrote on 2021/10/25 9:04 下午:
> On 22/10/2021 06:05, Xuewen Yan wrote:
>> Hi Chunguang
>>
>> brookxu <brookxu.cn@xxxxxxxxx> 于2021年10月21日周四 下午4:24写道:
>>>
>>> From: Chunguang Xu <brookxu@xxxxxxxxxxx>
>>>
>>> When debugging EAS, I found that if the task is migrated to
>>> max_spare_cap_cpu, even if the power consumption of pd is lower,
>
> The task p hasn't been migrated yet. `max_spare_cap_cpu` here is only a
> potential candidate CPU to be selected for p.
>
>>> we still put the task on prev_cpu. Maybe we should fix it.
>>>
>>> Signed-off-by: Chunguang Xu <brookxu@xxxxxxxxxxx>
>>> ---
>>> kernel/sched/fair.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index ff69f245b939..2ae7e03de6d2 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6867,8 +6867,10 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>>> /* Evaluate the energy impact of using max_spare_cap_cpu. */
>>> if (max_spare_cap_cpu >= 0) {
>>> cur_delta = compute_energy(p, max_spare_cap_cpu, pd);
>>> - if (cur_delta < base_energy_pd)
>>
>> this is aimed to prevent the cur_delta < 0, and usuallly, when the
>> task was put on the max_spare_cpu, the cur_power should be bigger than
>> base_pd_power,
>> if the cur_power < base_pd_power, the cpu util may have changed, at
>> this time, we should keep prev_cpu.
>>
>> You can look at below discuss and patch:
>> https://lore.kernel.org/all/20210429101948.31224-3-Pierre.Gondois@xxxxxxx/
>> https://lore.kernel.org/all/CAB8ipk_vgtg5d1obH36BYfNLZosbwr2k_U3xnAD4=H5uZt_M_g@xxxxxxxxxxxxxx/
>
> That's correct. `prev_delta < base_energy_pd` or `cur_delta <
> base_energy_pd` indicate the rare case that `compute_energy() { ->
> cpu_util_next() -> cpu util }` returns a higher energy value for the
> perf domain w/o the task p than w/ it.
>
> `base_energy_pd` stands for the energy spend on the CPUs of the Perf
> Domain (PD) w/o considering the task p (compute_energy(p, *-1*, pd)),
> `dst_cpu == -1`.
>
> If this happens to a candidate CPU (prev_cpu or a per-PD
> max_spare_cap_cpu) we bail out and return target (i.e. prev_cpu) because
> we can't compare the energy values (prev_delta and best_delta) later on
> in this case.
Right, thanks all :)
> [...]
>