Re: [PATCH] sched/fair: use signed long when compute energy delta in eas

From: Pierre
Date: Thu Apr 08 2021 - 05:33:47 EST


Hi,
Hi

On Wed, Apr 7, 2021 at 10:11 PM Pierre <pierre.gondois@xxxxxxx> wrote:
>
> Hi,
> > I test the patch, but the overflow still exists.
> > In the "sched/fair: Use pd_cache to speed up find_energy_efficient_cpu()"
> > I wonder why recompute the cpu util when cpu==dst_cpu in compute_energy(),
> > when the dst_cpu's util change, it also would cause the overflow.
>
> The patches aim to cache the energy values for the CPUs whose
> utilization is not modified (so we don't have to compute it multiple
> times). The values cached are the 'base values' of the CPUs, i.e. when
> the task is not placed on the CPU. When (cpu==dst_cpu) in
> compute_energy(), it means the energy values need to be updated instead
> of using the cached ones.
>
well, is it better to use the task_util(p) + cache values ? but in
this case, the cache
values may need more parameters.

This patch-set is not significantly improving the execution time of feec(). The results we have so far are an improvement of 5-10% in execution time, with feec() being executed in < 10us. So the gain is not spectacular.


> You are right, there is still a possibility to have a negative delta
> with the patches at:
> https://gitlab.arm.com/linux-arm/linux-power/-/commits/eas/next/integration-20210129 <https://gitlab.arm.com/linux-arm/linux-power/-/commits/eas/next/integration-20210129>
> Adding a check before subtracting the values, and bailing out in such
> case would avoid this, such as at:
> https://gitlab.arm.com/linux-arm/linux-pg/-/commits/feec_bail_out/ <https://gitlab.arm.com/linux-arm/linux-pg/-/commits/feec_bail_out/>
>
In your patch, you bail out the case by "go to fail", that means you
don't use eas in such
case. However, in the actual scene, the case often occurr when select
cpu for small task.
As a result, the small task would not select cpu according to the eas,
it may affect
power consumption?
With this patch (bailing out), the percentage of feec() returning due to a negative delta I get are:
on a Juno-r2, with 2 big CPUs and 4 CPUs (capacity of 383), with a workload running during 5s with task having a period of 16 ms and:
 - 50 tasks at 1%:   0.14%
 - 30 tasks at 1%:   0.54%
 - 10 tasks at 1%: < 0.1%
 - 30 tasks at 5%: < 0.1%
 - 10 tasks at 5%: < 0.1%
It doesn't happen so often to me.If we bail out of feec(), the task will still have another opportunity in the next call. However I agree this can lead to a bad placement when this happens.

> I think a similar modification should be done in your patch. Even though
> this is a good idea to group the calls to compute_energy() to reduce the
> chances of having updates of utilization values in between the
> compute_energy() calls,
> there is still a chance to have updates. I think it happened when I
> applied your patch.
>
> About changing the delta(s) from 'unsigned long' to 'long', I am not
> sure of the meaning of having a negative delta. I thing it would be
> better to check and fail before it happens instead.
>
> Regards
>