Re: [PATCH V2 1/3] Calculate Thermal Pressure

From: Quentin Perret
Date: Wed May 08 2019 - 08:42:58 EST


Hi Thara,

Sorry for the delayed response.

On Friday 26 Apr 2019 at 10:17:56 (-0400), Thara Gopinath wrote:
> On 04/25/2019 08:45 AM, Vincent Guittot wrote:
> > Do you mean calling a variant of sched_update_thermal_pressure() in
> > update_cpu_capacity() instead of periodic update ?
> > Yes , that should be enough
>
> Hi,
>
> I do have some concerns in doing this.
> 1. Updating thermal pressure does involve some calculations for
> accumulating, averaging, decaying etc which in turn could have some
> finite and measurable time spent in the function. I am not sure if this
> delay will be acceptable for all systems during load balancing (I have
> not measured the time involved). We need to decide if this is something
> we can live with.
>
> 2. More importantly, since update can happen from at least two paths (
> thermal fw and periodic timer in case of this patch series)to ensure
> mutual exclusion, the update is done under a spin lock. Again calling
> from update_cpu_capacity will involve holding the lock in the load
> balance path which is possible not for the best.
> For me, updating out of load balance minimizes the disruption to
> scheduler on the whole.
>
> But if there is an over whelming support for updating the statistics
> from the LB , I can move the code.

If I try to clarify my point a little bit, my observation is really that
it's a shame to update the thermal stats often, but to not reflect that
in capacity_of().

So in fact there are two alternatives: 1) do the update only during LB
(which is what I suggested first) to avoid 'useless' work; or 2) reflect
the thermal pressure in the CPU capacity every time the thermal stats
are updated.

And thinking more about it, perhaps 2) is actually a better option? With
this we could try smaller decay periods than the LB interval (which is
most likely useless otherwise) and make sure the capacity considered
during wake-up is up-to-date. This should be a good thing for latency
sensitive tasks I think. (If you consider a task in the Android display
pipeline for example, it needs to run within 16ms or the frame is
missed. So, on wake-up, we'd like to know where the task can run fast
_now_, not according to the capacities the CPUs had 200ms ago or so).

Thoughts ?
Quentin