Re: [PATCH v3 01/17] cpufreq: Prepare timer flags for hierarchical timer pull model

From: Frederic Weisbecker
Date: Wed Oct 26 2022 - 09:56:06 EST


On Tue, Oct 25, 2022 at 03:58:34PM +0200, Anna-Maria Behnsen wrote:
> Note: This is a proposal only. I was waiting on input how to change this
> driver properly to use the already existing infrastructure. See therfore
> the thread on linux-pm mailinglist:
> https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@xxxxxxxxxxxxx/
>
> gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE
> flag. When moving to hierarchical timer pull model, pinned and deferrable
> timers are stored in separate bases.
>
> To ensure gpstates timer always expires on the CPU where it is pinned to,
> keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag.

OTOH there are deferrable timers out there that expect to run on a
specific CPU, because there are always queued with add_timer_on().

For example workqueues using DECLARE_DEFERRABLE_WORK() that are queued
with queue_delayed_work_on(). Like vmstat().

Those are not explicitely pinned because they don't rely on __mod_timer()
but they expect CPU affinity.

Thanks.

>
> While at it, rewrite comment explaining the rule for timer expiry for the
> next interval and fix whitespace damages.
>
> Signed-off-by: Anna-Maria Behnsen <anna-maria@xxxxxxxxxxxxx>
> Cc: linux-pm@xxxxxxxxxxxxxxx
> Cc: Rafael J. Wysocki <rafael@xxxxxxxxxx>
> Cc: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
> ---
> drivers/cpufreq/powernv-cpufreq.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index fddbd1ea1635..08d6bd54539d 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -640,18 +640,18 @@ static inline int calc_global_pstate(unsigned int elapsed_time,
> return highest_lpstate_idx + index_diff;
> }
>
> -static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
> +static inline void queue_gpstate_timer(struct global_pstate_info *gpstates)
> {
> unsigned int timer_interval;
>
> /*
> - * Setting up timer to fire after GPSTATE_TIMER_INTERVAL ms, But
> - * if it exceeds MAX_RAMP_DOWN_TIME ms for ramp down time.
> - * Set timer such that it fires exactly at MAX_RAMP_DOWN_TIME
> - * seconds of ramp down time.
> + * Timer should expire next time after GPSTATE_TIMER_INTERVAL. If
> + * the resulting interval (elapsed time + interval) between last
> + * and next timer expiry is greater than MAX_RAMP_DOWN_TIME, ensure
> + * it is maximum MAX_RAMP_DOWN_TIME when queueing the next timer.
> */
> if ((gpstates->elapsed_time + GPSTATE_TIMER_INTERVAL)
> - > MAX_RAMP_DOWN_TIME)
> + > MAX_RAMP_DOWN_TIME)
> timer_interval = MAX_RAMP_DOWN_TIME - gpstates->elapsed_time;
> else
> timer_interval = GPSTATE_TIMER_INTERVAL;
> @@ -865,8 +865,7 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
>
> /* initialize timer */
> gpstates->policy = policy;
> - timer_setup(&gpstates->timer, gpstate_timer_handler,
> - TIMER_PINNED | TIMER_DEFERRABLE);
> + timer_setup(&gpstates->timer, gpstate_timer_handler, TIMER_PINNED);
> gpstates->timer.expires = jiffies +
> msecs_to_jiffies(GPSTATE_TIMER_INTERVAL);
> spin_lock_init(&gpstates->gpstate_lock);
> --
> 2.30.2
>