Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching

From: Peter Zijlstra
Date: Mon Mar 07 2016 - 08:32:21 EST


On Mon, Mar 07, 2016 at 02:15:47PM +0100, Rafael J. Wysocki wrote:
> On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> > Sure I know all that. But that, to me, seems like an argument for why
> > you should have done this a long time ago.
>
> While I generally agree with this, I don't quite see why cleaning that
> up necessarily has to be connected to the current patch series which
> is my point.

Ah OK, fair enough I suppose. But someone should stick this on their
TODO list, we should not 'forget' about this (again).

> > But I do think something wants to be done here.
>
> So here's what I can do for the "fast switch" thing.
>
> There is the fast_switch_possible policy flag that's necessary anyway.
> I can make notifier registration fail when that is set for at least
> one policy and I can make the setting of it fail if at least one
> notifier has already been registered.
>
> However, without spending too much time on chasing code dependencies i
> sort of suspect that it will uncover things that register cpufreq
> notifiers early and it won't be possible to use fast switch without
> sorting that out.

The two x86 users don't register notifiers when CONSTANT_TSC, which
seems to be the right thing.

Much of the other users seem unlikely to be used on x86, so I suspect
the initial fallout will be very limited.

*groan* modules, cpufreq allows drivers to be modules, so init sequences
are poorly defined at best :/ Yes that blows.

> And that won't even change anything apart from
> removing some code that has not worked for quite a while already and
> nobody noticed.

Which is always a good thing, but yes, we can do this later.

> It is doable for the "fast switch" thing, but it won't help in all of
> the other cases when notifications are not reliable.

Right, you can maybe add a 'NOTIFIERS_BROKEN' flag to the intel_p_state
and HWP drivers or so, and trigger off of that.

> If it changes frequently enough, it's not practical and not even
> necessary to cause things like thermal to react on every change, but I
> think there needs to be a way to make them reevaluate things
> regularly. Arguably, they might set a timer for that, but why would
> they need a timer if they could get triggered by the code that
> actually makes changes?

So that very much depends on what thermal actually needs; but I suspect
that using a timer is cheaper than using irq_work to kick off something
else.

The irq_work is a LAPIC write (self IPI), just as the timer. However
timers can be coalesced, resulting in, on average, less timer
reprogramming than there are handlers ran.

Now, if thermal can do without work and can run in-line just like the
fast freq switch, then yes, that might make sense.