Re: [PATCH] sched: Take thermal pressure into account when determine rt fits capacity

From: Dietmar Eggemann
Date: Mon Apr 11 2022 - 04:21:44 EST


On 07/04/2022 07:19, Xuewen Yan wrote:
> There are cases when the cpu max capacity might be reduced due to thermal.
> Take into the thermal pressure into account when judge whether the rt task
> fits the cpu. And when schedutil govnor get cpu util, the thermal pressure
> also should be considered.
>
> Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> ---
> kernel/sched/cpufreq_schedutil.c | 1 +
> kernel/sched/rt.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 3dbf351d12d5..285ad51caf0f 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -159,6 +159,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> struct rq *rq = cpu_rq(sg_cpu->cpu);
> unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);
>
> + max -= arch_scale_thermal_pressure(sg_cpu->cpu);

max' = arch_scale_cpu_capacity() - arch_scale_thermal_pressure()

For the energy part (A) we use max' in compute_energy() to cap sum_util
and max_util at max' and to call em_cpu_energy(..., max_util, sum_util,
max'). This was done to match (B)'s `policy->max` capping.

For the frequency part (B) we have freq_qos_update_request() in:

power_actor_set_power()
...
cdev->ops->set_cur_state()

cpufreq_set_cur_state()
freq_qos_update_request() <-- !
arch_update_thermal_pressure()

restricting `policy->max` which then clamps `target_freq` in:

cpufreq_update_util()
...
get_next_freq()
cpufreq_driver_resolve_freq()
__resolve_freq()

[...]

> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index a32c46889af8..d9982ebd4821 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -466,6 +466,7 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
> max_cap = uclamp_eff_value(p, UCLAMP_MAX);
>
> cpu_cap = capacity_orig_of(cpu);
> + cpu_cap -= arch_scale_thermal_pressure(cpu);
>
> return cpu_cap >= min(min_cap, max_cap);
> }

IMHO, this should follow what we do with rq->cpu_capacity
(capacity_of(), the remaining capacity for CFS). E.g. we use
capacity_of() in find_energy_efficient_cpu() and select_idle_capacity()
to compare capacities. So we would need a function like
scale_rt_capacity() for RT (minus the rq->avg_rt.util_avg) but then also
one for DL (minus rq->avg_dl.util_avg and rq->avg_rt.util_avg).