Re: [PATCH v3 01/17] cpufreq: Prepare timer flags for hierarchical timer pull model
From: Anna-Maria Behnsen
Date: Mon Oct 31 2022 - 11:22:43 EST
On Wed, 26 Oct 2022, Frederic Weisbecker wrote:
> On Tue, Oct 25, 2022 at 03:58:34PM +0200, Anna-Maria Behnsen wrote:
> > Note: This is a proposal only. I was waiting on input how to change this
> > driver properly to use the already existing infrastructure. See therfore
> > the thread on linux-pm mailinglist:
> > https://lore.kernel.org/linux-pm/4c99f34b-40f1-e6cc-2669-7854b615b5fd@xxxxxxxxxxxxx/
> >
> > gpstates timer is the only timer using TIMER_PINNED and TIMER_DEFERRABLE
> > flag. When moving to hierarchical timer pull model, pinned and deferrable
> > timers are stored in separate bases.
> >
> > To ensure gpstates timer always expires on the CPU where it is pinned to,
> > keep only TIMER_PINNED flag and drop TIMER_DEFERRABLE flag.
>
> OTOH there are deferrable timers out there that expect to run on a
> specific CPU, because there are always queued with add_timer_on().
>
> For example workqueues using DECLARE_DEFERRABLE_WORK() that are queued
> with queue_delayed_work_on(). Like vmstat().
>
> Those are not explicitely pinned because they don't rely on __mod_timer()
> but they expect CPU affinity.
>
You are right. In contrast to the original plan, I'm not able (yet) to
remove the deferrable timers completely. But all timers using the
add_timer_on() path need the TIMER_PINNED flag. Then three timer bases per
CPU will be available:
- global base (TIMER_PINNED is not set)
- local base (TIMER_PINNED is set but not TIMER_DEFERRABLE)
- deferrable pinned base (TIMER_PINNED and TIMER_DEFERRABLE is set)
The logic stays the same as already implemented in patch queue: Timers in
global base will not prevent CPU from going idle. When the CPU has the
migrator duty, timers in hierarchy are taken into account. Timers in local
base force the CPU to wake up. Timers in the deferrable pinned base are not
taken into account when going idle.
With this, the rework of cpufreq driver is no longer required - the timer
will end up in deferrable pinned base the same with vmstat.
Thanks,
Anna-Maria