Re: [PATCH v2 02/10] cpufreq: provide data for frequency-invariant load-tracking support

From: Peter Zijlstra
Date: Wed Jul 12 2017 - 07:14:49 EST


On Wed, Jul 12, 2017 at 02:57:55PM +0530, Viresh Kumar wrote:
> On 12-07-17, 10:31, Peter Zijlstra wrote:
> > So the problem with the thread is two-fold; one the one hand we like the
> > scheduler to directly set frequency, but then we need to schedule a task
> > to change the frequency, which will change the frequency and around we
> > go.
> >
> > On the other hand, there's very nasty issues with PI. This thread would
> > have very high priority (otherwise the SCHED_DEADLINE stuff won't work)
> > but that then means this thread needs to boost the owner of the i2c
> > mutex. And that then creates a massive bandwidth accounting hole.
> >
> >
> > The advantage of using an interrupt driven state machine is that all
> > those issues go away.
> >
> > But yes, whichever way around you turn things, its crap. But given the
> > hardware its the best we can do.
>
> Thanks for the explanation Peter.
>
> IIUC, it will take more time to change the frequency eventually with
> the interrupt-driven state machine as there may be multiple bottom
> halves involved here, for supply, clk, etc, which would run at normal
> priorities now. And those were boosted currently due to the high
> priority sugov thread. And we are fine with that (from performance
> point of view) ?

I'm not sure what you mean; bottom halves as in softirq? From what I can
tell an i2c bus does clk_prepare_enable() on registration and from that
point on clk_enable() is usable from atomic contexts. But afaict clk
stuff doesn't do interrupts at all.

(with a note that I absolutely hate the clk locking)

I think the interrupt driven thing can actually be faster than the
'regular' task waiting on the mutex. The regulator message can be
locklessly queued (it only ever makes sense to have 1 such message
pending, any later one will invalidate a prior one).

Then the i2c interrupt can detect the availability of this pending
message and splice it into the transfer queue at an opportune moment.

(of course, the current i2c bits don't support any of that)

> Coming back to where we started from (where should we call
> arch_set_freq_scale() from ?).

The drivers.. the core cpufreq doesn't know when (if any) transition is
completed.

> I think we would still need some kind of synchronization between
> cpufreq core and the cpufreq drivers to make sure we don't start
> another freq change before the previous one is complete. Otherwise
> the cpufreq drivers would be required to have similar support with
> proper locking in place.

Not sure what you mean; also not sure why. On x86 we never know, cannot
know. So why would this stuff be any different.

> And if the core is going to get notified about successful freq changes
> (which it should IMHO), then it may still be better to call
> arch_set_freq_scale() from the core itself and not from individual
> drivers.

I would not involve the core. All we want from the core is a unified
interface towards requesting DVFS changes. Everything that happens after
is not its business.