Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs

From: Steve Muckle
Date: Sat May 21 2016 - 15:46:26 EST


Hi Peter, Ingo,

On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > In the case of a remote update the hook has to run (or not) after it is
> > > known whether preemption will occur so we don't do needless work or
> > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > hook will always need to be called along with an indication that it is
> > > the early hook being called. If it turns out to be a remote update it
> > > could then be deferred to the later hook, which would only be called
> > > when a remote update has been deferred and preemption has not occurred.
> > >
> > > This means two hook inovcations for a remote non-preempting wakeup
> > > though instead of one. Perhaps this is a good middle ground on code
> > > churn vs. optimization though.
> >
> > I would think so.
>
> Ok, I will pursue this approach.

I'd like to get your opinion here before proceeding further...

To catch you up and summarize, I'm trying to implement support for
calling the scheduler cpufreq callback during remote wakeups. Currently
the scheduler cpufreq callback is only called when the target CPU is the
current CPU. If a remote wakeup does not result in preemption, the CPU
frequency may currently not be adjusted appropriately for up to a tick,
when we are guaranteed to call the hook again.

Invoking schedutil promptly for the target CPU in this situation means
sending an IPI if the current CPU is not in the target CPU's frequency
domain. This is because often a cpufreq driver must run on a CPU within
the frequency domain it is bound to. But the catch is that we should
not do this and incur the overhead of an IPI if preemption will occur,
as in that case the scheduler (and schedutil) will run soon on the
target CPU anyway, potentially as a result of the scheduler sending its
own IPI.

I figured this unnecessary overhead would be unacceptable and so have
been working on an approach to avoid it. Unfortunately the current hooks
happen before the preemption decision is made. My current implementation
sets a flag if schedutil sees a remote wakeup and then bails. There's a
test to call the hook again at the end of check_preempt_curr() if the flag
is set. The flag is cleared by resched_curr() as that means preemption
will happen on the target CPU. The flag currently lives at the end of
the rq struct. I could move it into the update_util_data hook structure
or elsewhere, but that would mean accessing another per-cpu item in
hot scheduler paths.

Thoughts? Note the current implementation described above differs a bit
from the last posting in this thread, per discussion with Rafael.

thanks,
Steve