Re: [Patch v4 2/6] sched: Add infrastructure to store and update instantaneous thermal pressure
From: Peter Zijlstra
Date: Mon Oct 28 2019 - 11:21:48 EST
On Tue, Oct 22, 2019 at 04:34:21PM -0400, Thara Gopinath wrote:
> Add thermal.c and thermal.h files that provides interface
> APIs to initialize, update/average, track, accumulate and decay
> thermal pressure per cpu basis. A per cpu variable delta_capacity is
> introduced to keep track of instantaneous per cpu thermal pressure.
> Thermal pressure is the delta between maximum capacity and capped
> capacity due to a thermal event.
> API trigger_thermal_pressure_average is called for periodic accumulate
> and decay of the thermal pressure. It is to to be called from a
> periodic tick function. This API passes on the instantaneous delta
> capacity of a cpu to update_thermal_load_avg to do the necessary
> accumulate, decay and average.
> API update_thermal_pressure is for the system to update the thermal
> pressure by providing a capped frequency ratio.
> Considering, trigger_thermal_pressure_average reads delta_capacity and
> update_thermal_pressure writes into delta_capacity, one can argue for
> some sort of locking mechanism to avoid a stale value.
> But considering trigger_thermal_pressure_average can be called from a
> system critical path like scheduler tick function, a locking mechanism
> is not ideal. This means that it is possible the delta_capacity value
> used to calculate average thermal pressure for a cpu can be
> stale for upto 1 tick period.
Please use a blank line at the end of a paragraph.
> Signed-off-by: Thara Gopinath <thara.gopinath@xxxxxxxxxx>
> include/linux/sched.h | 8 ++++++++
> kernel/sched/Makefile | 2 +-
> kernel/sched/thermal.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/thermal.h | 13 +++++++++++++
> 4 files changed, 67 insertions(+), 1 deletion(-)
> create mode 100644 kernel/sched/thermal.c
> create mode 100644 kernel/sched/thermal.h
These are some tiny files, do these functions really need their own
> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
> new file mode 100644
> index 0000000..0c84960
> --- /dev/null
> +++ b/kernel/sched/thermal.c
> @@ -0,0 +1,45 @@
> +// SPDX-License-Identifier: GPL-2.0
> + * Scheduler Thermal Interactions
> + *
> + * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@xxxxxxxxxx>
> + */
> +#include <linux/sched.h>
> +#include "sched.h"
> +#include "pelt.h"
> +#include "thermal.h"
> +static DEFINE_PER_CPU(unsigned long, delta_capacity);
> + * update_thermal_pressure: Update thermal pressure
> + * @cpu: the cpu for which thermal pressure is to be updated for
> + * @capped_freq_ratio: capped max frequency << SCHED_CAPACITY_SHIFT / max freq
> + *
> + * capped_freq_ratio is normalized into capped capacity and the delta between
> + * the arch_scale_cpu_capacity and capped capacity is stored in per cpu
> + * delta_capacity.
> + */
> +void update_thermal_pressure(int cpu, u64 capped_freq_ratio)
> + unsigned long __capacity, delta;
> + /* Normalize the capped freq ratio */
> + __capacity = (capped_freq_ratio * arch_scale_cpu_capacity(cpu)) >>
> + SCHED_CAPACITY_SHIFT;
> + delta = arch_scale_cpu_capacity(cpu) - __capacity;
> + pr_debug("updating cpu%d thermal pressure to %lu\n", cpu, delta);
Surely we can do without the pr_debug() here?
> + per_cpu(delta_capacity, cpu) = delta;